TY  - THES
A1  - Lenau, Simon
T1  - Statistical and Machine Learning Methods for Handling Selectivity in Non-Probability Samples
N2  - Non-probability sampling is a topic of growing relevance, especially due to its occurrence in the context of new emerging data sources like web surveys and Big Data. 

This thesis addresses statistical challenges arising from non-probability samples, where unknown or uncontrolled sampling mechanisms raise concerns in terms of data quality and representativity.

Various methods to quantify and reduce the potential selectivity and biases of non-probability samples in estimation and inference are discussed. The thesis introduces new forms of prediction and weighting methods, namely

a) semi-parametric artificial neural networks (ANNs) that integrate B-spline layers with optimal knot positioning in the general structure and fitting procedure of artificial neural networks, and

b) calibrated semi-parametric ANNs that determine weights for non-probability samples by integrating an ANN as response model with calibration constraints for totals, covariances and correlations.

Custom-made computational implementations are developed for fitting (calibrated) semi-parametric ANNs by means of stochastic gradient descent, BFGS and sequential quadratic programming algorithms. 

The performance of all the discussed methods is evaluated and compared for a bandwidth of non-probability sampling scenarios in a Monte Carlo simulation study as well as an application to a real non-probability sample, the WageIndicator web survey.

Potentials and limitations of the different methods for dealing with the challenges of non-probability sampling under various circumstances are highlighted. It is shown that the best strategy for using non-probability samples heavily depends on the particular selection mechanism, research interest and available auxiliary information. 

Nevertheless, the findings show that existing as well as newly proposed methods can be used to ease or even fully counterbalance the issues of non-probability samples and highlight the conditions under which this is possible.
KW  - selectivity
KW  - data quality
KW  - machine learning
KW  - calibration
KW  - survey statistics
KW  - Datenerhebung
KW  - Methode
Y1  - 2023
UR  - https://ubt.opus.hbz-nrw.de/frontdoor/index/index/docId/1980
UR  - https://nbn-resolving.org/urn:nbn:de:hbz:385-1-19804
SP  - I
EP  - 399
ER  -