Statistical and Machine Learning Methods for Handling Selectivity in Non-Probability Samples

Lenau, Simon

search hit 2 of 66

Statistical and Machine Learning Methods for Handling Selectivity in Non-Probability Samples

Non-probability sampling is a topic of growing relevance, especially due to its occurrence in the context of new emerging data sources like web surveys and Big Data. This thesis addresses statistical challenges arising from non-probability samples, where unknown or uncontrolled sampling mechanisms raise concerns in terms of data quality and representativity. Various methods to quantify and reduce the potential selectivity and biases of non-probability samples in estimation and inference are discussed. The thesis introduces new forms of prediction and weighting methods, namely a) semi-parametric artificial neural networks (ANNs) that integrate B-spline layers with optimal knot positioning in the general structure and fitting procedure of artificial neural networks, and b) calibrated semi-parametric ANNs that determine weights for non-probability samples by integrating an ANN as response model with calibration constraints for totals, covariances and correlations. Custom-made computational implementations are developed for fitting (calibrated) semi-parametric ANNs by means of stochastic gradient descent, BFGS and sequential quadratic programming algorithms. The performance of all the discussed methods is evaluated and compared for a bandwidth of non-probability sampling scenarios in a Monte Carlo simulation study as well as an application to a real non-probability sample, the WageIndicator web survey. Potentials and limitations of the different methods for dealing with the challenges of non-probability sampling under various circumstances are highlighted. It is shown that the best strategy for using non-probability samples heavily depends on the particular selection mechanism, research interest and available auxiliary information. Nevertheless, the findings show that existing as well as newly proposed methods can be used to ease or even fully counterbalance the issues of non-probability samples and highlight the conditions under which this is possible.

Metadaten
Author:	Simon Lenau
URN:	urn:nbn:de:hbz:385-1-19804
DOI:	https://doi.org/10.25353/ubtr-xxxx-4b41-d240
Referee:	Ralf Münnich, Silvia Biffignandi
Advisor:	Ralf Münnich
Document Type:	Doctoral Thesis
Language:	English
Date of completion:	2023/02/27
Publishing institution:	Universität Trier
Granting institution:	Universität Trier, Fachbereich 4
Date of final exam:	2022/12/02
Release Date:	2023/03/01
Tag:	calibration; data quality; machine learning; selectivity; survey statistics
GND Keyword:	Datenerhebung; Methode
Number of pages:	XXVI, 399 Seiten
First page:	I
Last page:	399
Institutes:	Fachbereich 4 / Wirtschaftswissenschaften
Dewey Decimal Classification:	3 Sozialwissenschaften / 33 Wirtschaft / 330 Wirtschaft
Licence (German):	CC BY-NC-SA: Creative-Commons-Lizenz 4.0 International

Statistical and Machine Learning Methods for Handling Selectivity in Non-Probability Samples

Download full text files

Export metadata

Additional Services