Spatial Robust Small Area Estimation applied on Business Data

Robuste Small Area Verfahren angewendet auf Unternehmensdaten unter Berücksichtigung von räumlichen Zusammenhängen

  • The demand for reliable statistics has been growing over the past decades, because more and more political and economic decisions are based on statistics, e.g. regional planning, allocation of funds or business decisions. Therefore, it has become increasingly important to develop and to obtain precise regional indicators as well as disaggregated values in order to compare regions or specific groups. In general, surveys provide the information for these indicators only for larger areas like countries or administrative divisions. However, in practice, it is more interesting to obtain indicators for specific subdivisions like on NUTS 2 or NUTS 3 levels. The Nomenclature of Units for Territorial Statistics (NUTS) is a hierarchical system of the European Union used in statistics to refer to subdivisions of countries. In many cases, the sample information on such detailed levels is not available. Thus, there are projects such as the European Census, which have the goal to provide precise numbers on NUTS 3 or even community level. The European Census is conducted amongst others in Germany and Switzerland in 2011. Most of the participating countries use sample and register information in a combined form for the estimation process. The classical estimation methods of small areas or subgroups, such as the Horvitz-Thompson (HT) estimator or the generalized regression (GREG) estimator, suffer from small area-specific sample sizes which cause high variances of the estimates. The application of small area methods, for instance the empirical best linear unbiased predictor (EBLUP), reduces the variance of the estimates by including auxiliary information to increase the effective sample size. These estimation methods lead to higher accuracy of the variables of interest. Small area estimation is also used in the context of business data. For example during the estimation of the revenues of specific subgroups like on NACE 3 or NACE 4 levels, small sample sizes can occur. The Nomenclature statistique des activités économiques dans la Communauté européenne (NACE) is a system of the European Union which defines an industry standard classification. Besides small sample sizes, business data have further special characteristics. The main challenge is that business data have skewed distributions with a few large companies and many small businesses. For instance, in the automotive industry in Germany, there are many small suppliers but only few large original equipment manufacturers (OEM). Altogether, highly influential units and outliers can be observed in business statistics. These extreme values in connection with small sample sizes cause severe problems when standard small area models are applied. These models are generally based on the normality assumption, which does not hold in the case of outliers. One way to solve these peculiarities is to apply outlier robust small area methods. The availability of adequate covariates is important for the accuracy of the above described small area methods. However, in business data, the auxiliary variables are hardly available on population level. One of several reasons for that is the fact that in Germany a lot of enterprises are not reflected in business registers due to truncation limits. Furthermore, only listed enterprises or companies which trespass specific thresholds are obligated to publish their results. This limits the number of potential auxiliary variables for the estimation. Even though there are issues with available covariates, business data often include spatial dependencies which can be used to enhance small area methods. Next to spatial information based on geographic characteristics, group-specific similarities like related industries based on NACE codes can be used. For instance, enterprises from the same NACE 2 level, e.g. sector 47 retail trade, behave more similar than two companies from different NACE 2 levels, e.g. sector 05 mining of coal and sector 64 financial services. This spatial correlation can be incorporated by extending the general linear mixed model trough the integration of spatially correlated random effects. In business data, outliers as well as geographic or content-wise spatial dependencies between areas or domains are closely linked. The coincidence of these two factors and the resulting consequences have not been fully covered in the relevant literature. The only approach that combines robust small area methods with spatial dependencies is the M-quantile geographically weighted regression model. In the context of EBLUP-based small area models, the combination of robust and spatial methods has not been considered yet. Therefore, this thesis provides a theoretical approach to this scientific and practical problem and shows its relevance in an empirical study.
  • In den letzten Jahren gab es einen steigenden Bedarf an disaggregierten Wirtschaftsdaten, da immer mehr Unternehmen und Institutionen ihre Entscheidungen auf Basis solcher Informationen treffen. Diese tiefgegliederten Daten können beispielsweise Umsätze von Unternehmen in einer speziellen Industrieklasse (NACE3- oder NACE4-Ebene) sein. Allerdings sind in der Realität Stichproben auf solch detailliertem Niveau selten verfügbar, was in der Regel kleine Stichprobenumfänge bewirkt. In solchen Fällen führen klassische Schätzverfahren oft zu schlechten Ergebnissen und Small Area Methoden müssen angewendet werden. Das Grundprinzip von diesen Verfahren besteht darin, dass die Schätzung durch Erweiterung der ursprünglich zu kleinen Datengruppe verbessert wird. Dies kann durch die Kombination der Erhebungsinformationen mit Hilfsvariablen, wie zum Beispiel Unternehmensdaten aus ähnlich strukturierten Industriezweigen, erreicht werden. Diese Methoden liefern unter strikten Annahmen über die Stichprobe und die Verteilung selbst bei kleinen Stichprobenumfängen deutlich bessere Schätzergebnisse als klassische Verfahren. Des Weiteren beinhalten einzelne Industrieklassen stark unterschiedlich ausgeprägte Werte. Ein Beispiel hierfür ist die Automobilindustrie, in der neben vielen kleineren Zulieferunternehmen auch einige große Hersteller in dieser Branche zu finden sind. Solche stark abweichenden Beobachtungen werden in der Statistik als Ausreißer bezeichnet. Bei Small Area Verfahren führt die Kombination von Ausreißern mit kleinen Stichprobenumfängen zu starken Problemen. Aus diesem Grund müssen Small Area Methoden gegenüber solch starken Beobachtungen robustifiziert werden. Derzeit gibt es in diesem Bereich zwei Ansätze: den robusten EBLUP und M-Quantile Modelle. Neben Ausreißern weisen aber Unternehmensdaten oft auch räumliche Muster auf, wie zum Beispiel geografische Nachbarschaftsstrukturen oder Branchenähnlichkeiten. Diese räumlichen Strukturen können ebenfalls benutzt werden, um die ursprünglich zu kleine Datengruppe zu erweitern. Die gleichzeitige Berücksichtigung von Ausreißern und räumlichen Mustern bei Schätzverfahren ist in der Wissenschaft jedoch noch nicht vollständig untersucht. Aus diesem Grund widmet sich die Arbeit dieser wissenschaftlich relevanten Problemstellung.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Author:Timo Schmid
URN:urn:nbn:de:hbz:385-7315
Advisor:Ralf Münnich
Document Type:Doctoral Thesis
Language:English
Date of completion:2012/02/10
Publishing institution:Universität Trier
Granting institution:Universität Trier, Fachbereich 4
Date of final exam:2011/12/21
Release Date:2012/02/10
Tag:Small Area Verfahren
Business data; Robust methods; Simulation study; Small Area Estimation; Spatial correlation
GND Keyword:Monte-Carlo-Simulation; Robuste Schätzung; Räumliche Statistik; Unternehmensdaten
Institutes:Fachbereich 4 / Wirtschaftswissenschaften
Dewey Decimal Classification:3 Sozialwissenschaften / 31 Statistiken / 310 Sammlungen allgemeiner Statistiken

$Rev: 13581 $