The dissertation deals with methods to improve design-based and model-assisted estimation techniques for surveys in a finite population framework. The focus is on the development of the statistical methodology as well as their implementation by means of tailor-made numerical optimization strategies. In that regard, the developed methods aim at computing statistics for several potentially conflicting variables of interest at aggregated and disaggregated levels of the population on the basis of one single survey. The work can be divided into two main research questions, which are briefly explained in the following sections.
First, an optimal multivariate allocation method is developed taking into account several stratification levels. This approach results in a multi-objective optimization problem due to the simultaneous consideration of several variables of interest. In preparation for the numerical solution, several scalarization and standardization techniques are presented, which represent the different preferences of potential users. In addition, it is shown that by solving the problem scalarized with a weighted sum for all combinations of weights, the entire Pareto frontier of the original problem can be generated. By exploiting the special structure of the problem, the scalarized problems can be efficiently solved by a semismooth Newton method. In order to apply this numerical method to other scalarization techniques as well, an alternative approach is suggested, which traces the problem back to the weighted sum case. To address regional estimation quality requirements at multiple stratification levels, the potential use of upper bounds for regional variances is integrated into the method. In addition to restrictions on regional estimates, the method enables the consideration of box-constraints for the stratum-specific sample sizes, allowing minimum and maximum stratum-specific sampling fractions to be defined.
In addition to the allocation method, a generalized calibration method is developed, which is supposed to achieve coherent and efficient estimates at different stratification levels. The developed calibration method takes into account a very large number of benchmarks at different stratification levels, which may be obtained from different sources such as registers, paradata or other surveys using different estimation techniques. In order to incorporate the heterogeneous quality and the multitude of benchmarks, a relaxation of selected benchmarks is proposed. In that regard, predefined tolerances are assigned to problematic benchmarks at low aggregation levels in order to avoid an exact fulfillment. In addition, the generalized calibration method allows the use of box-constraints for the correction weights in order to avoid an extremely high variation of the weights. Furthermore, a variance estimation by means of a rescaling bootstrap is presented.
Both developed methods are analyzed and compared with existing methods in extensive simulation studies on the basis of a realistic synthetic data set of all households in Germany. Due to the similar requirements and objectives, both methods can be successively applied to a single survey in order to combine their efficiency advantages. In addition, both methods can be solved in a time-efficient manner using very comparable optimization approaches. These are based on transformations of the optimality conditions. The dimension of the resulting system of equations is ultimately independent of the dimension of the original problem, which enables the application even for very large problem instances.
In the first part of this work we generalize a method of building optimal confidence bounds provided in Buehler (1957) by specializing an exhaustive class of confidence regions inspired by Sterne (1954). The resulting confidence regions, also called Buehlerizations, are valid in general models and depend on a designated statistic'' that can be chosen according to some desired monotonicity behaviour of the confidence region. For a fixed designated statistic, the thus obtained family of confidence regions indexed by their confidence level is nested. Buehlerizations have furthermore the optimality property of being the smallest (w.r.t. set inclusion) confidence regions that are increasing in their designated statistic. The theory is eventually applied to normal, binomial, and exponential samples. The second part deals with the statistical comparison of pairs of diagnostic tests and establishes relations 1. between the sets of lower confidence bounds, 2. between the sets of pairs of comparable lower confidence bounds, and 3. between the sets of admissible lower confidence bounds in various models for diverse parameters of interest.
In politics and economics, and thus in the official statistics, the precise estimation of indicators for small regions or parts of populations, the so-called Small Areas or domains, is discussed intensively. The design-based estimation methods currently used are mainly based on asymptotic properties and are thus reliable for large sample sizes. With small sample sizes, however, this design based considerations often do not apply, which is why special model-based estimation methods have been developed for this case - the Small Area methods. While these may be biased, they often have a smaller mean squared error (MSE) as the unbiased design based estimators. In this work both classic design-based estimation methods and model-based estimation methods are presented and compared. The focus lies on the suitability of the various methods for their use in official statistics. First theory and algorithms suitable for the required statistical models are presented, which are the basis for the subsequent model-based estimators. Sampling designs are then presented apt for Small Area applications. Based on these fundamentals, both design-based estimators and as well model-based estimation methods are developed. Particular consideration is given in this case to the area-level empirical best predictor for binomial variables. Numerical and Monte Carlo estimation methods are proposed and compared for this analytically unsolvable estimator. Furthermore, MSE estimation methods are proposed and compared. A very popular and flexible resampling method that is widely used in the field of Small Area Statistics, is the parametric bootstrap. One major drawback of this method is its high computational intensity. To mitigate this disadvantage, a variance reduction method for parametric bootstrap is proposed. On the basis of theoretical considerations the enormous potential of this proposal is proved. A Monte Carlo simulation study shows the immense variance reduction that can be achieved with this method in realistic scenarios. This can be up to 90%. This actually enables the use of parametric bootstrap in applications in official statistics. Finally, the presented estimation methods in a large Monte Carlo simulation study in a specific application for the Swiss structural survey are examined. Here problems are discussed, which are of high relevance for official statistics. These are in particular: (a) How small can the areas be without leading to inappropriate or to high precision estimates? (b) Are the accuracy specifications for the Small Area estimators reliable enough to use it for publication? (c) Do very small areas infer in the modeling of the variables of interest? Could they cause thus a deterioration of the estimates of larger and therefore more important areas? (d) How can covariates, which are in different levels of aggregation be used in an appropriate way to improve the estimates. The data basis is the Swiss census of 2001. The main results are that in the author- view, the use of small area estimators for the production of estimates for areas with very small sample sizes is advisable in spite of the modeling effort. The MSE estimates provide a useful measure of precision, but do not reach in all Small Areas the level of reliability of the variance estimates for design-based estimators.
The presented research aims at providing a first empirical investigation on lexical structure in Chinese with appropriate quantitative methods. The research objects contain individual properties of words (part of speech, polyfunctionality, polysemy, word length), the relationships between properties (part of speech and polyfunctionality, polyfunctionality and polysemy, polysemy and word length) and the lexical structure composed by those properties. Some extant hypotheses in QL, such as distributions of polysemy and the relationship between word length and polysemy, are tested on the data of Chinese, which enrich the applicability of the laws with a language not tested yet. Several original hypotheses such as the distribution of polyfunctionality and the relationship between polyfunctionality and polysemy are set up and inspected.
In der modernen Survey-Statistik treten immer häufifiger Optimierungsprobleme auf, die es zu lösen gilt. Diese sind oft von hoher Dimension und Simulationsstudien erfordern das mehrmalige Lösen dieser Optimierungsprobleme. Um dies in angemessener Zeit durchführen zu können, sind spezielle Algorithmen und Lösungsansätze erforderlich, welche in dieser Arbeit entwickelt und untersucht werden. Bei den Optimierungsproblemen handelt es sich zum einen um Allokationsprobleme zur Bestimmung optimaler Teilstichprobenumfänge. Hierbei werden neben auf einem Nullstellenproblem basierende, stetige Lösungsmethoden auch ganzzahlige, auf der Greedy-Idee basierende Lösungsmethoden untersucht und die sich ergebenden Optimallösungen miteinander verglichen.Zum anderen beschäftigt sich diese Arbeit mit verschiedenen Kalibrierungsproblemen. Hierzu wird ein alternativer Lösungsansatz zu den bisher praktizierten Methoden vorgestellt. Dieser macht das Lösen eines nichtglatten Nullstellenproblemes erforderlich, was mittels desrnnichtglatten Newton Verfahrens erfolgt. Im Zusammenhang mit nichtglatten Optimierungsalgorithmen spielt die Schrittweitensteuerung eine große Rolle. Hierzu wird ein allgemeiner Ansatz zur nichtmonotonen Schrittweitensteuerung bei Bouligand-differenzierbaren Funktionen betrachtet. Neben der klassischen Kalibrierung wird ferner ein Kalibrierungsproblem zur kohärenten Small Area Schätzung unter relaxierten Nebenbedingungen und zusätzlicher Beschränkung der Variation der Designgewichte betrachtet. Dieses Problem lässt sich in ein hochdimensionales quadratisches Optimierungsproblem umwandeln, welches die Verwendung von Lösern für dünn besetzte Optimierungsprobleme erfordert.Die in dieser Arbeit betrachteten numerischen Probleme können beispielsweise bei Zensen auftreten. In diesem Zusammenhang werden die vorgestellten Ansätze abschließend in Simulationsstudien auf eine mögliche Anwendung auf den Zensus 2011 untersucht, die im Rahmen des Zensus-Stichprobenforschungsprojektes untersucht wurden.
Bei synthetischen Simulationsgesamtheiten handelt es sich um künstlichernDaten, die zur Nachbildung von realen Phänomenen in Simulationen verwendetrnwerden. In der vorliegenden Arbeit werden Anforderungen und Methoden zur Erzeugung dieser Daten vorgestellt. Anhand von drei Beispielen wird gezeigt, wie erzeugte synthetische Daten in einer Simulation zur Anwendung kommen.