Filtern
Schlagworte
- Schätzung (4)
- Erhebungsverfahren (3)
- Stichprobe (3)
- survey statistics (3)
- Haushalt (2)
- Unternehmen (2)
- small area estimation (2)
- Amtliche Statistik (1)
- Calibration (1)
- Complex survey data (1)
Institut
- Fachbereich 4 (9) (entfernen)
The Eurosystem's Household Finance and Consumption Survey (HFCS) collects micro data on private households' balance sheets, income and consumption. It is a stylised fact that wealth is unequally distributed and that the wealthiest own a large share of total wealth. For sample surveys which aim at measuring wealth and its distribution, this is a considerable problem. To overcome it, some of the country surveys under the HFCS umbrella try to sample a disproportionately large share of households that are likely to be wealthy, a technique referred to as oversampling. Ignoring such types of complex survey designs in the estimation of regression models can lead to severe problems. This thesis first illustrates such problems using data from the first wave of the HFCS and canonical regression models from the field of household finance and gives a first guideline for HFCS data users regarding the use of replicate weight sets for variance estimation using a variant of the bootstrap. A further investigation of the issue necessitates a design-based Monte Carlo simulation study. To this end, the already existing large close-to-reality synthetic simulation population AMELIA is extended with synthetic wealth data. We discuss different approaches to the generation of synthetic micro data in the context of the extension of a synthetic simulation population that was originally based on a different data source. We propose an additional approach that is suitable for the generation of highly skewed synthetic micro data in such a setting using a multiply-imputed survey data set. After a description of the survey designs employed in the first wave of the HFCS, we then construct new survey designs for AMELIA that share core features of the HFCS survey designs. A design-based Monte Carlo simulation study shows that while more conservative approaches to oversampling do not pose problems for the estimation of regression models if sampling weights are properly accounted for, the same does not necessarily hold for more extreme oversampling approaches. This issue should be further analysed in future research.
Official business surveys form the basis for national and regional business statistics and are thus of great importance for analysing the state and performance of the economy. However, both the heterogeneity of business data and their high dynamics pose a particular challenge to the feasibility of sampling and the quality of the resulting estimates. A widely used sampling frame for creating the design of an official business survey is an extract from an official business register. However, if this frame does not accurately represent the target population, frame errors arise. Amplified by the heterogeneity and dynamics of business populations, these errors can significantly affect the estimation quality and lead to inefficiencies and biases. This dissertation therefore deals with design-based methods for optimising business surveys with respect to different types of frame errors.
First, methods for adjusting the sampling design of business surveys are addressed. These approaches integrate auxiliary information about the expected structures of frame errors into the sampling design. The aim is to increase the number of sampled businesses that are subject to frame errors. The element-specific frame error probability is estimated based on auxiliary information about frame errors observed in previous samples. The approaches discussed consider different types of frame errors and can be incorporated into predefined designs with fixed strata.
As the second main pillar of this work, methods for adjusting weights to correct for frame errors during estimation are developed and investigated. As a result of frame errors, the assumptions under which the original design weights were determined based on the sampling design no longer hold. The developed methods correct the design weights taking into account the errors identified for sampled elements. Case-number-based reweighting approaches, on the one hand, attempt to reconstruct the unknown size of the individual strata in the target population. In the context of weight smoothing methods, on the other hand, design weights are modelled and smoothed as a function of target or auxiliary variables. This serves to avoid inefficiencies in the estimation due to highly scattering weights or weak correlations between weights and target variables. In addition, possibilities of correcting frame errors by calibration weighting are elaborated. Especially when the sampling frame shows over- and/or undercoverage, the inclusion of external auxiliary information can provide a significant improvement of the estimation quality. For those methods whose quality cannot be measured using standard procedures, a procedure for estimating the variance based on a rescaling bootstrap is proposed. This enables an assessment of the estimation quality when using the methods in practice.
In the context of two extensive simulation studies, the methods presented in this dissertation are evaluated and compared with each other. First, in the environment of an experimental simulation, it is assessed which approaches are particularly suitable with regard to different data situations. In a second simulation study, which is based on the structural survey in the services sector, the applicability of the methods in practice is evaluated under realistic conditions.
Survey data can be viewed as incomplete or partially missing from a variety of perspectives and there are different ways of dealing with this kind of data in the prediction and the estimation of economic quantities. In this thesis, we present two selected research contexts in which the prediction or estimation of economic quantities is examined under incomplete survey data.
These contexts are first the investigation of composite estimators in the German Microcensus (Chapters 3 and 4) and second extensions of multivariate Fay-Herriot (MFH) models (Chapters 5 and 6), which are applied to small area problems.
Composite estimators are estimation methods that take into account the sample overlap in rotating panel surveys such as the German Microcensus in order to stabilise the estimation of the statistics of interest (e.g. employment statistics). Due to the partial sample overlaps, information from previous samples is only available for some of the respondents, so the data are partially missing.
MFH models are model-based estimation methods that work with aggregated survey data in order to obtain more precise estimation results for small area problems compared to classical estimation methods. In these models, several variables of interest are modelled simultaneously. The survey estimates of these variables, which are used as input in the MFH models, are often partially missing. If the domains of interest are not explicitly accounted for in a sampling design, the sizes of the samples allocated to them can, by chance, be small. As a result, it can happen that either no estimates can be calculated at all or that the estimated values are not published by statistical offices because their variances are too large.