Statistical and Machine Learning Methods for Handling Selectivity in Non-Probability Samples

Lenau, Simon

Treffer 96 von 1449

Statistical and Machine Learning Methods for Handling Selectivity in Non-Probability Samples

Non-probability sampling is a topic of growing relevance, especially due to its occurrence in the context of new emerging data sources like web surveys and Big Data. This thesis addresses statistical challenges arising from non-probability samples, where unknown or uncontrolled sampling mechanisms raise concerns in terms of data quality and representativity. Various methods to quantify and reduce the potential selectivity and biases of non-probability samples in estimation and inference are discussed. The thesis introduces new forms of prediction and weighting methods, namely a) semi-parametric artificial neural networks (ANNs) that integrate B-spline layers with optimal knot positioning in the general structure and fitting procedure of artificial neural networks, and b) calibrated semi-parametric ANNs that determine weights for non-probability samples by integrating an ANN as response model with calibration constraints for totals, covariances and correlations. Custom-made computational implementations are developed for fitting (calibrated) semi-parametric ANNs by means of stochastic gradient descent, BFGS and sequential quadratic programming algorithms. The performance of all the discussed methods is evaluated and compared for a bandwidth of non-probability sampling scenarios in a Monte Carlo simulation study as well as an application to a real non-probability sample, the WageIndicator web survey. Potentials and limitations of the different methods for dealing with the challenges of non-probability sampling under various circumstances are highlighted. It is shown that the best strategy for using non-probability samples heavily depends on the particular selection mechanism, research interest and available auxiliary information. Nevertheless, the findings show that existing as well as newly proposed methods can be used to ease or even fully counterbalance the issues of non-probability samples and highlight the conditions under which this is possible.

Metadaten
Verfasserangaben:	Simon Lenau
URN:	urn:nbn:de:hbz:385-1-19804
DOI:	https://doi.org/10.25353/ubtr-xxxx-4b41-d240
Gutachter:	Ralf Münnich, Silvia Biffignandi
Betreuer:	Ralf Münnich
Dokumentart:	Dissertation
Sprache:	Englisch
Datum der Fertigstellung:	27.02.2023
Veröffentlichende Institution:	Universität Trier
Titel verleihende Institution:	Universität Trier, Fachbereich 4
Datum der Abschlussprüfung:	02.12.2022
Datum der Freischaltung:	01.03.2023
Freies Schlagwort / Tag:	calibration; data quality; machine learning; selectivity; survey statistics
GND-Schlagwort:	Datenerhebung; Methode
Seitenzahl:	XXVI, 399 Seiten
Erste Seite:	I
Letzte Seite:	399
Institute:	Fachbereich 4 / Wirtschaftswissenschaften
DDC-Klassifikation:	3 Sozialwissenschaften / 33 Wirtschaft / 330 Wirtschaft
Lizenz (Deutsch):	CC BY-NC-SA: Creative-Commons-Lizenz 4.0 International

Statistical and Machine Learning Methods for Handling Selectivity in Non-Probability Samples

Volltext Dateien herunterladen

Metadaten exportieren

Weitere Dienste