Filtern
Erscheinungsjahr
- 2018 (2) (entfernen)
Sprache
- Englisch (2) (entfernen)
Schlagworte
- Schätztheorie (2) (entfernen)
Institut
Sample surveys are a widely used and cost effective tool to gain information about a population under consideration. Nowadays, there is an increasing demand not only for information on the population level but also on the level of subpopulations. For some of these subpopulations of interest, however, very small subsample sizes might occur such that the application of traditional estimation methods is not expedient. In order to provide reliable information also for those so called small areas, small area estimation (SAE) methods combine auxiliary information and the sample data via a statistical model.
The present thesis deals, among other aspects, with the development of highly flexible and close to reality small area models. For this purpose, the penalized spline method is adequately modified which allows to determine the model parameters via the solution of an unconstrained optimization problem. Due to this optimization framework, the incorporation of shape constraints into the modeling process is achieved in terms of additional linear inequality constraints on the optimization problem. This results in small area estimators that allow for both the utilization of the penalized spline method as a highly flexible modeling technique and the incorporation of arbitrary shape constraints on the underlying P-spline function.
In order to incorporate multiple covariates, a tensor product approach is employed to extend the penalized spline method to multiple input variables. This leads to high-dimensional optimization problems for which naive solution algorithms yield an unjustifiable complexity in terms of runtime and in terms of memory requirements. By exploiting the underlying tensor nature, the present thesis provides adequate computationally efficient solution algorithms for the considered optimization problems and the related memory efficient, i.e. matrix-free, implementations. The crucial point thereby is the (repetitive) application of a matrix-free conjugated gradient method, whose runtime is drastically reduced by a matrx-free multigrid preconditioner.
Surveys are commonly tailored to produce estimates of aggregate statistics with a desired level of precision. This may lead to very small sample sizes for subpopulations of interest, defined geographically or by content, which are not incorporated into the survey design. We refer to subpopulations where the sample size is too small to provide direct estimates with adequate precision as small areas or small domains. Despite the small sample sizes, reliable small area estimates are needed for economic and political decision making. Hence, model-based estimation techniques are used which increase the effective sample size by borrowing strength from other areas to provide accurate information for small areas. The paragraph above introduced small area estimation as a field of survey statistics where two conflicting philosophies of statistical inference meet: the design-based and the model-based approach. While the first approach is well suited for the precise estimation of aggregate statistics, the latter approach furnishes reliable small area estimates. In most applications, estimates for both large and small domains based on the same sample are needed. This poses a challenge to the survey planner, as the sampling design has to reflect different and potentially conflicting requirements simultaneously. In order to enable efficient design-based estimates for large domains, the sampling design should incorporate information related to the variables of interest. This may be achieved using stratification or sampling with unequal probabilities. Many model-based small area techniques require an ignorable sampling design such that after conditioning on the covariates the variable of interest does not contain further information about the sample membership. If this condition is not fulfilled, biased model-based estimates may result, as the model which holds for the sample is different from the one valid for the population. Hence, an optimisation of the sampling design without investigating the implications for model-based approaches will not be sufficient. Analogously, disregarding the design altogether and focussing only on the model is prone to failure as well. Instead, a profound knowledge of the interplay between the sample design and statistical modelling is a prerequisite for implementing an effective small area estimation strategy. In this work, we concentrate on two approaches to address this conflict. Our first approach takes the sampling design as given and can be used after the sample has been collected. It amounts to incorporate the survey design into the small area model to avoid biases stemming from informative sampling. Thus, once a model is validated for the sample, we know that it holds for the population as well. We derive such a procedure under a lognormal mixed model, which is a popular choice when the support of the dependent variable is limited to positive values. Besides, we propose a three pillar strategy to select the additional variable accounting for the design, based on a graphical examination of the relationship, a comparison of the predictive accuracy of the choices and a check regarding the normality assumptions.rnrnOur second approach to deal with the conflict is based on the notion that the design should allow applying a wide variety of analyses using the sample data. Thus, if the use of model-based estimation strategies can be anticipated before the sample is drawn, this should be reflected in the design. The same applies for the estimation of national statistics using design-based approaches. Therefore, we propose to construct the design such that the sampling mechanism is non-informative but allows for precise design-based estimates at an aggregate level.