## 310 Sammlungen allgemeiner Statistiken

This dissertation deals with consistent estimates in household surveys. Household surveys are often drawn via cluster sampling, with households sampled at the first stage and persons selected at the second stage. The collected data provide information for estimation at both the person and the household level. However, consistent estimates are desirable in the sense that the estimated household-level totals should coincide with the estimated totals obtained at the person-level. Current practice in statistical offices is to use integrated weighting. In this approach consistent estimates are guaranteed by equal weights for all persons within a household and the household itself. However, due to the forced equality of weights, the individual patterns of persons are lost and the heterogeneity within households is not taken into account. In order to avoid the negative consequences of integrated weighting, we propose alternative weighting methods in the first part of this dissertation that ensure both consistent estimates and individual person weights within a household. The underlying idea is to limit the consistency conditions to variables that emerge in both the personal and household data sets. These common variables are included in the person- and household-level estimator as additional auxiliary variables. This achieves consistency more directly and only for the relevant variables, rather than indirectly by forcing equal weights on all persons within a household. Further decisive advantages of the proposed alternative weighting methods are that original individual rather than the constructed aggregated auxiliaries are utilized and that the variable selection process is more flexible because different auxiliary variables can be incorporated in the person-level estimator than in the household-level estimator.
In the second part of this dissertation, the variances of a person-level GREG estimator and an integrated estimator are compared in order to quantify the effects of the consistency requirements in the integrated weighting approach. One of the challenges is that the estimators to be compared are of different dimensions. The proposed solution is to decompose the variance of the integrated estimator into the variance of a reduced GREG estimator, whose underlying model is of the same dimensions as the person-level GREG estimator, and add a constructed term that captures the effects disregarded by the reduced model. Subsequently, further fields of application for the derived decomposition are proposed such as the variable selection process in the field of econometrics or survey statistics.

Surveys are commonly tailored to produce estimates of aggregate statistics with a desired level of precision. This may lead to very small sample sizes for subpopulations of interest, defined geographically or by content, which are not incorporated into the survey design. We refer to subpopulations where the sample size is too small to provide direct estimates with adequate precision as small areas or small domains. Despite the small sample sizes, reliable small area estimates are needed for economic and political decision making. Hence, model-based estimation techniques are used which increase the effective sample size by borrowing strength from other areas to provide accurate information for small areas. The paragraph above introduced small area estimation as a field of survey statistics where two conflicting philosophies of statistical inference meet: the design-based and the model-based approach. While the first approach is well suited for the precise estimation of aggregate statistics, the latter approach furnishes reliable small area estimates. In most applications, estimates for both large and small domains based on the same sample are needed. This poses a challenge to the survey planner, as the sampling design has to reflect different and potentially conflicting requirements simultaneously. In order to enable efficient design-based estimates for large domains, the sampling design should incorporate information related to the variables of interest. This may be achieved using stratification or sampling with unequal probabilities. Many model-based small area techniques require an ignorable sampling design such that after conditioning on the covariates the variable of interest does not contain further information about the sample membership. If this condition is not fulfilled, biased model-based estimates may result, as the model which holds for the sample is different from the one valid for the population. Hence, an optimisation of the sampling design without investigating the implications for model-based approaches will not be sufficient. Analogously, disregarding the design altogether and focussing only on the model is prone to failure as well. Instead, a profound knowledge of the interplay between the sample design and statistical modelling is a prerequisite for implementing an effective small area estimation strategy. In this work, we concentrate on two approaches to address this conflict. Our first approach takes the sampling design as given and can be used after the sample has been collected. It amounts to incorporate the survey design into the small area model to avoid biases stemming from informative sampling. Thus, once a model is validated for the sample, we know that it holds for the population as well. We derive such a procedure under a lognormal mixed model, which is a popular choice when the support of the dependent variable is limited to positive values. Besides, we propose a three pillar strategy to select the additional variable accounting for the design, based on a graphical examination of the relationship, a comparison of the predictive accuracy of the choices and a check regarding the normality assumptions.rnrnOur second approach to deal with the conflict is based on the notion that the design should allow applying a wide variety of analyses using the sample data. Thus, if the use of model-based estimation strategies can be anticipated before the sample is drawn, this should be reflected in the design. The same applies for the estimation of national statistics using design-based approaches. Therefore, we propose to construct the design such that the sampling mechanism is non-informative but allows for precise design-based estimates at an aggregate level.

In politics and economics, and thus in the official statistics, the precise estimation of indicators for small regions or parts of populations, the so-called Small Areas or domains, is discussed intensively. The design-based estimation methods currently used are mainly based on asymptotic properties and are thus reliable for large sample sizes. With small sample sizes, however, this design based considerations often do not apply, which is why special model-based estimation methods have been developed for this case - the Small Area methods. While these may be biased, they often have a smaller mean squared error (MSE) as the unbiased design based estimators. In this work both classic design-based estimation methods and model-based estimation methods are presented and compared. The focus lies on the suitability of the various methods for their use in official statistics. First theory and algorithms suitable for the required statistical models are presented, which are the basis for the subsequent model-based estimators. Sampling designs are then presented apt for Small Area applications. Based on these fundamentals, both design-based estimators and as well model-based estimation methods are developed. Particular consideration is given in this case to the area-level empirical best predictor for binomial variables. Numerical and Monte Carlo estimation methods are proposed and compared for this analytically unsolvable estimator. Furthermore, MSE estimation methods are proposed and compared. A very popular and flexible resampling method that is widely used in the field of Small Area Statistics, is the parametric bootstrap. One major drawback of this method is its high computational intensity. To mitigate this disadvantage, a variance reduction method for parametric bootstrap is proposed. On the basis of theoretical considerations the enormous potential of this proposal is proved. A Monte Carlo simulation study shows the immense variance reduction that can be achieved with this method in realistic scenarios. This can be up to 90%. This actually enables the use of parametric bootstrap in applications in official statistics. Finally, the presented estimation methods in a large Monte Carlo simulation study in a specific application for the Swiss structural survey are examined. Here problems are discussed, which are of high relevance for official statistics. These are in particular: (a) How small can the areas be without leading to inappropriate or to high precision estimates? (b) Are the accuracy specifications for the Small Area estimators reliable enough to use it for publication? (c) Do very small areas infer in the modeling of the variables of interest? Could they cause thus a deterioration of the estimates of larger and therefore more important areas? (d) How can covariates, which are in different levels of aggregation be used in an appropriate way to improve the estimates. The data basis is the Swiss census of 2001. The main results are that in the author- view, the use of small area estimators for the production of estimates for areas with very small sample sizes is advisable in spite of the modeling effort. The MSE estimates provide a useful measure of precision, but do not reach in all Small Areas the level of reliability of the variance estimates for design-based estimators.