TY - THES A1 - Lenau, Simon T1 - Statistical and Machine Learning Methods for Handling Selectivity in Non-Probability Samples N2 - Non-probability sampling is a topic of growing relevance, especially due to its occurrence in the context of new emerging data sources like web surveys and Big Data. This thesis addresses statistical challenges arising from non-probability samples, where unknown or uncontrolled sampling mechanisms raise concerns in terms of data quality and representativity. Various methods to quantify and reduce the potential selectivity and biases of non-probability samples in estimation and inference are discussed. The thesis introduces new forms of prediction and weighting methods, namely a) semi-parametric artificial neural networks (ANNs) that integrate B-spline layers with optimal knot positioning in the general structure and fitting procedure of artificial neural networks, and b) calibrated semi-parametric ANNs that determine weights for non-probability samples by integrating an ANN as response model with calibration constraints for totals, covariances and correlations. Custom-made computational implementations are developed for fitting (calibrated) semi-parametric ANNs by means of stochastic gradient descent, BFGS and sequential quadratic programming algorithms. The performance of all the discussed methods is evaluated and compared for a bandwidth of non-probability sampling scenarios in a Monte Carlo simulation study as well as an application to a real non-probability sample, the WageIndicator web survey. Potentials and limitations of the different methods for dealing with the challenges of non-probability sampling under various circumstances are highlighted. It is shown that the best strategy for using non-probability samples heavily depends on the particular selection mechanism, research interest and available auxiliary information. Nevertheless, the findings show that existing as well as newly proposed methods can be used to ease or even fully counterbalance the issues of non-probability samples and highlight the conditions under which this is possible. KW - selectivity KW - data quality KW - machine learning KW - calibration KW - survey statistics KW - Datenerhebung KW - Methode Y1 - 2023 UR - https://ubt.opus.hbz-nrw.de/frontdoor/index/index/docId/1980 UR - https://nbn-resolving.org/urn:nbn:de:hbz:385-1-19804 SP - I EP - 399 ER -