Survey data can be viewed as incomplete or partially missing from a variety of perspectives and there are different ways of dealing with this kind of data in the prediction and the estimation of economic quantities. In this thesis, we present two selected research contexts in which the prediction or estimation of economic quantities is examined under incomplete survey data.
These contexts are first the investigation of composite estimators in the German Microcensus (Chapters 3 and 4) and second extensions of multivariate Fay-Herriot (MFH) models (Chapters 5 and 6), which are applied to small area problems.
Composite estimators are estimation methods that take into account the sample overlap in rotating panel surveys such as the German Microcensus in order to stabilise the estimation of the statistics of interest (e.g. employment statistics). Due to the partial sample overlaps, information from previous samples is only available for some of the respondents, so the data are partially missing.
MFH models are model-based estimation methods that work with aggregated survey data in order to obtain more precise estimation results for small area problems compared to classical estimation methods. In these models, several variables of interest are modelled simultaneously. The survey estimates of these variables, which are used as input in the MFH models, are often partially missing. If the domains of interest are not explicitly accounted for in a sampling design, the sizes of the samples allocated to them can, by chance, be small. As a result, it can happen that either no estimates can be calculated at all or that the estimated values are not published by statistical offices because their variances are too large.
A basic assumption of standard small area models is that the statistic of interest can be modelled through a linear mixed model with common model parameters for all areas in the study. The model can then be used to stabilize estimation. In some applications, however, there may be different subgroups of areas, with specific relationships between the response variable and auxiliary information. In this case, using a distinct model for each subgroup would be more appropriate than employing one model for all observations. If no suitable natural clustering variable exists, finite mixture regression models may represent a solution that „lets the data decide“ how to partition areas into subgroups. In this framework, a set of two or more different models is specified, and the estimation of subgroup-specific model parameters is performed simultaneously to estimating subgroup identity, or the probability of subgroup identity, for each area. Finite mixture models thus offer a fexible approach to accounting for unobserved heterogeneity. Therefore, in this thesis, finite mixtures of small area models are proposed to account for the existence of latent subgroups of areas in small area estimation. More specifically, it is assumed that the statistic of interest is appropriately modelled by a mixture of K linear mixed models. Both mixtures of standard unit-level and standard area-level models are considered as special cases. The estimation of mixing proportions, area-specific probabilities of subgroup identity and the K sets of model parameters via the EM algorithm for mixtures of mixed models is described. Eventually, a finite mixture small area estimator is formulated as a weighted mean of predictions from model 1 to K, with weights given by the area-specific probabilities of subgroup identity.