## 310 Sammlungen allgemeiner Statistiken

### Refine

#### Year of publication

- 2018 (2) (remove)

#### Keywords

- Amtliche Statistik (1)
- Calibration (1)
- Numerical Optimization (1)
- Official Statistics (1)
- Optimal Multivariate Allocation (1)
- Optimierung (1)
- Schätztheorie (1)
- Statistik (1)
- Stichprobennahme (1)
- Survey Statistics (1)

#### Institute

The dissertation deals with methods to improve design-based and model-assisted estimation techniques for surveys in a finite population framework. The focus is on the development of the statistical methodology as well as their implementation by means of tailor-made numerical optimization strategies. In that regard, the developed methods aim at computing statistics for several potentially conflicting variables of interest at aggregated and disaggregated levels of the population on the basis of one single survey. The work can be divided into two main research questions, which are briefly explained in the following sections.
First, an optimal multivariate allocation method is developed taking into account several stratification levels. This approach results in a multi-objective optimization problem due to the simultaneous consideration of several variables of interest. In preparation for the numerical solution, several scalarization and standardization techniques are presented, which represent the different preferences of potential users. In addition, it is shown that by solving the problem scalarized with a weighted sum for all combinations of weights, the entire Pareto frontier of the original problem can be generated. By exploiting the special structure of the problem, the scalarized problems can be efficiently solved by a semismooth Newton method. In order to apply this numerical method to other scalarization techniques as well, an alternative approach is suggested, which traces the problem back to the weighted sum case. To address regional estimation quality requirements at multiple stratification levels, the potential use of upper bounds for regional variances is integrated into the method. In addition to restrictions on regional estimates, the method enables the consideration of box-constraints for the stratum-specific sample sizes, allowing minimum and maximum stratum-specific sampling fractions to be defined.
In addition to the allocation method, a generalized calibration method is developed, which is supposed to achieve coherent and efficient estimates at different stratification levels. The developed calibration method takes into account a very large number of benchmarks at different stratification levels, which may be obtained from different sources such as registers, paradata or other surveys using different estimation techniques. In order to incorporate the heterogeneous quality and the multitude of benchmarks, a relaxation of selected benchmarks is proposed. In that regard, predefined tolerances are assigned to problematic benchmarks at low aggregation levels in order to avoid an exact fulfillment. In addition, the generalized calibration method allows the use of box-constraints for the correction weights in order to avoid an extremely high variation of the weights. Furthermore, a variance estimation by means of a rescaling bootstrap is presented.
Both developed methods are analyzed and compared with existing methods in extensive simulation studies on the basis of a realistic synthetic data set of all households in Germany. Due to the similar requirements and objectives, both methods can be successively applied to a single survey in order to combine their efficiency advantages. In addition, both methods can be solved in a time-efficient manner using very comparable optimization approaches. These are based on transformations of the optimality conditions. The dimension of the resulting system of equations is ultimately independent of the dimension of the original problem, which enables the application even for very large problem instances.

Surveys are commonly tailored to produce estimates of aggregate statistics with a desired level of precision. This may lead to very small sample sizes for subpopulations of interest, defined geographically or by content, which are not incorporated into the survey design. We refer to subpopulations where the sample size is too small to provide direct estimates with adequate precision as small areas or small domains. Despite the small sample sizes, reliable small area estimates are needed for economic and political decision making. Hence, model-based estimation techniques are used which increase the effective sample size by borrowing strength from other areas to provide accurate information for small areas. The paragraph above introduced small area estimation as a field of survey statistics where two conflicting philosophies of statistical inference meet: the design-based and the model-based approach. While the first approach is well suited for the precise estimation of aggregate statistics, the latter approach furnishes reliable small area estimates. In most applications, estimates for both large and small domains based on the same sample are needed. This poses a challenge to the survey planner, as the sampling design has to reflect different and potentially conflicting requirements simultaneously. In order to enable efficient design-based estimates for large domains, the sampling design should incorporate information related to the variables of interest. This may be achieved using stratification or sampling with unequal probabilities. Many model-based small area techniques require an ignorable sampling design such that after conditioning on the covariates the variable of interest does not contain further information about the sample membership. If this condition is not fulfilled, biased model-based estimates may result, as the model which holds for the sample is different from the one valid for the population. Hence, an optimisation of the sampling design without investigating the implications for model-based approaches will not be sufficient. Analogously, disregarding the design altogether and focussing only on the model is prone to failure as well. Instead, a profound knowledge of the interplay between the sample design and statistical modelling is a prerequisite for implementing an effective small area estimation strategy. In this work, we concentrate on two approaches to address this conflict. Our first approach takes the sampling design as given and can be used after the sample has been collected. It amounts to incorporate the survey design into the small area model to avoid biases stemming from informative sampling. Thus, once a model is validated for the sample, we know that it holds for the population as well. We derive such a procedure under a lognormal mixed model, which is a popular choice when the support of the dependent variable is limited to positive values. Besides, we propose a three pillar strategy to select the additional variable accounting for the design, based on a graphical examination of the relationship, a comparison of the predictive accuracy of the choices and a check regarding the normality assumptions.rnrnOur second approach to deal with the conflict is based on the notion that the design should allow applying a wide variety of analyses using the sample data. Thus, if the use of model-based estimation strategies can be anticipated before the sample is drawn, this should be reflected in the design. The same applies for the estimation of national statistics using design-based approaches. Therefore, we propose to construct the design such that the sampling mechanism is non-informative but allows for precise design-based estimates at an aggregate level.