Refine
Keywords
- Business data (1)
- Monte-Carlo-Simulation (1)
- Robust methods (1)
- Robuste Schätzung (1)
- Räumliche Statistik (1)
- Simulation study (1)
- Small Area Estimation (1)
- Small Area Verfahren (1)
- Spatial correlation (1)
- Unternehmensdaten (1)
The demand for reliable statistics has been growing over the past decades, because more and more political and economic decisions are based on statistics, e.g. regional planning, allocation of funds or business decisions. Therefore, it has become increasingly important to develop and to obtain precise regional indicators as well as disaggregated values in order to compare regions or specific groups. In general, surveys provide the information for these indicators only for larger areas like countries or administrative divisions. However, in practice, it is more interesting to obtain indicators for specific subdivisions like on NUTS 2 or NUTS 3 levels. The Nomenclature of Units for Territorial Statistics (NUTS) is a hierarchical system of the European Union used in statistics to refer to subdivisions of countries. In many cases, the sample information on such detailed levels is not available. Thus, there are projects such as the European Census, which have the goal to provide precise numbers on NUTS 3 or even community level. The European Census is conducted amongst others in Germany and Switzerland in 2011. Most of the participating countries use sample and register information in a combined form for the estimation process. The classical estimation methods of small areas or subgroups, such as the Horvitz-Thompson (HT) estimator or the generalized regression (GREG) estimator, suffer from small area-specific sample sizes which cause high variances of the estimates. The application of small area methods, for instance the empirical best linear unbiased predictor (EBLUP), reduces the variance of the estimates by including auxiliary information to increase the effective sample size. These estimation methods lead to higher accuracy of the variables of interest. Small area estimation is also used in the context of business data. For example during the estimation of the revenues of specific subgroups like on NACE 3 or NACE 4 levels, small sample sizes can occur. The Nomenclature statistique des activités économiques dans la Communauté européenne (NACE) is a system of the European Union which defines an industry standard classification. Besides small sample sizes, business data have further special characteristics. The main challenge is that business data have skewed distributions with a few large companies and many small businesses. For instance, in the automotive industry in Germany, there are many small suppliers but only few large original equipment manufacturers (OEM). Altogether, highly influential units and outliers can be observed in business statistics. These extreme values in connection with small sample sizes cause severe problems when standard small area models are applied. These models are generally based on the normality assumption, which does not hold in the case of outliers. One way to solve these peculiarities is to apply outlier robust small area methods. The availability of adequate covariates is important for the accuracy of the above described small area methods. However, in business data, the auxiliary variables are hardly available on population level. One of several reasons for that is the fact that in Germany a lot of enterprises are not reflected in business registers due to truncation limits. Furthermore, only listed enterprises or companies which trespass specific thresholds are obligated to publish their results. This limits the number of potential auxiliary variables for the estimation. Even though there are issues with available covariates, business data often include spatial dependencies which can be used to enhance small area methods. Next to spatial information based on geographic characteristics, group-specific similarities like related industries based on NACE codes can be used. For instance, enterprises from the same NACE 2 level, e.g. sector 47 retail trade, behave more similar than two companies from different NACE 2 levels, e.g. sector 05 mining of coal and sector 64 financial services. This spatial correlation can be incorporated by extending the general linear mixed model trough the integration of spatially correlated random effects. In business data, outliers as well as geographic or content-wise spatial dependencies between areas or domains are closely linked. The coincidence of these two factors and the resulting consequences have not been fully covered in the relevant literature. The only approach that combines robust small area methods with spatial dependencies is the M-quantile geographically weighted regression model. In the context of EBLUP-based small area models, the combination of robust and spatial methods has not been considered yet. Therefore, this thesis provides a theoretical approach to this scientific and practical problem and shows its relevance in an empirical study.