Refine
Document Type
- Doctoral Thesis (19)
Has Fulltext
- yes (19)
Keywords
- Stichprobe (5)
- Schätzung (4)
- survey statistics (4)
- Erhebungsverfahren (3)
- Optimierung (3)
- Amtliche Statistik (2)
- Datenerhebung (2)
- Deutschland (2)
- Haushalt (2)
- Mikrosimulation (2)
- Official Statistics (2)
- Schätzfunktion (2)
- Statistik (2)
- Unsicherheit (2)
- Unternehmen (2)
- small area estimation (2)
- Allokation (1)
- Anonymisierung (1)
- Arbeitsmarkt (1)
- Business Surveys (1)
- Calibration (1)
- Column generation (1)
- Complex survey data (1)
- Computational Statistics (1)
- Conic Quadratic Optimization (1)
- Data anonymization (1)
- Datenfusion (1)
- Demographische Simulationen (1)
- Density Estimation (1)
- Discrete optimization (1)
- Einkommensunterschied (1)
- Erwerbstätigkeitsstatistik (1)
- Frame Mathematik (1)
- Ganzzahlige Optimierung (1)
- Generalized Variance Functions (1)
- Gleichgewichtsmodell (1)
- Kleinräumige Analysen (1)
- Kosten (1)
- Makroökonomisches Modell (1)
- Methode (1)
- Mietpreis (1)
- Migration (1)
- Mikrosimulationsmethoden (1)
- Mikrosimulationstheorie (1)
- Mikrozensus (1)
- Missing Data (1)
- Mixed-integer optimization (1)
- Multi-Level Modelling (1)
- Multi-Source Estimation (1)
- Multipurpose Sample Allocation (1)
- Nicht-linear Statistiken (1)
- Numerical Optimization (1)
- Optimal Multivariate Allocation (1)
- Penalized Maximum Likelihood (1)
- Programm (1)
- Qualität (1)
- Region (1)
- Regionale Mobilität (1)
- Regionale Verteilung (1)
- Regression estimator, household surveys, calibration, weighting, integrated weighting (1)
- Regression models (1)
- Regressionsmodell (1)
- Response Burden (1)
- Robust Statistics (1)
- Robust optimization (1)
- Sample Coordination (1)
- Schätztheorie (1)
- Self-organizing Maps (1)
- Statistical Learning (1)
- Statistical Matching (1)
- Statistical Properties (1)
- Stichprobenentnahme (1)
- Stichprobenfehler (1)
- Stichprobenkoordination (1)
- Stichprobennahme (1)
- Stichprobenumfang (1)
- Stratified sampling (1)
- Survey Statistics (1)
- Survey statistics (1)
- Surveys (1)
- Synthetic micro data generation (1)
- Synthetische Daten (1)
- Theorie (1)
- Umfrage (1)
- Unvollkommene Information (1)
- Varianzschätzung (1)
- Vermögen (1)
- Veränderung von Querschnitten (1)
- Wahrscheinlichkeit (1)
- Wealth surveys (1)
- Weighted Chebyshev Minimization (1)
- Weighted Regression (1)
- Zeitdiskrete Mikrosimulationen (1)
- Zeitreihenanalyse (1)
- business surveys (1)
- calibration (1)
- data quality (1)
- employment estimation (1)
- frame errors (1)
- k-Anonymity (1)
- machine learning (1)
- missing data (1)
- official statistics (1)
- rental prices (1)
- sampling frame (1)
- selectivity (1)
- statistical modelling (1)
- Ökonometrisches Modell (1)
Institute
- Fachbereich 4 (10)
- Wirtschaftswissenschaften (3)
- Mathematik (1)
Das Ziel dynamischer Mikrosimulationen ist es, die Entwicklung von Systemen über das Verhalten der einzelnen enthaltenen Bestandteile zu simulieren, um umfassende szenariobasierte Analysen zu ermöglichen. Im Bereich der Wirtschafts- und Sozialwissenschaften wird der Fokus üblicherweise auf Populationen bestehend aus Personen und Haushalten gelegt. Da politische und wirtschaftliche Entscheidungsprozesse meist auf lokaler Ebene getroffen werden, bedarf es zudem kleinräumiger Informationen, um gezielte Handlungsempfehlungen ableiten zu können. Das stellt Forschende wiederum vor große Herausforderungen im Erstellungsprozess regionalisierter Simulationsmodelle. Dieser Prozess reicht von der Generierung geeigneter Ausgangsdatensätze über die Erfassung und Umsetzung der dynamischen Komponenten bis hin zur Auswertung der Ergebnisse und Quantifizierung von Unsicherheiten. Im Rahmen dieser Arbeit werden ausgewählte Komponenten, die für regionalisierte Mikrosimulationen von besonderer Relevanz sind, beschrieben und systematisch analysiert.
Zunächst werden in Kapitel 2 theoretische und methodische Aspekte von Mikrosimulationen vorgestellt, um einen umfassenden Überblick über verschiedene Arten und Möglichkeiten der Umsetzung dynamischer Modellierungen zu geben. Im Fokus stehen dabei die Grundlagen der Erfassung und Simulation von Zuständen und Zustandsänderungen sowie die damit verbundenen strukturellen Aspekte im Simulationsprozess.
Sowohl für die Simulation von Zustandsänderungen als auch für die Erweiterung der Datenbasis werden primär logistische Regressionsmodelle zur Erfassung und anschließenden wahrscheinlichkeitsbasierten Vorhersage der Bevölkerungsstrukturen auf Mikroebene herangezogen. Die Schätzung beruht insbesondere auf Stichprobendaten, die in der Regel neben einem eingeschränktem Stichprobenumfang keine oder nur unzureichende regionale Differenzierungen zulassen. Daher können bei der Vorhersage von Wahrscheinlichkeiten erhebliche Differenzen zu bekannten Totalwerten entstehen. Um eine Harmonisierung mit den Totalwerten zu erhalten, lassen sich Methoden zur Anpassung von Wahrscheinlichkeiten – sogenannte Alignmentmethoden – anwenden. In der Literatur werden zwar unterschiedliche Möglichkeiten beschrieben, über die Auswirkungen dieser Verfahren auf die Güte der Modelle ist jedoch kaum etwas bekannt. Zur Beurteilung verschiedener Techniken werden diese im Rahmen von Kapitel 3 in umfassenden Simulationsstudien unter verschiedenen Szenarien umgesetzt. Hierbei kann gezeigt werden, dass durch die Einbindung zusätzlicher Informationen im Modellierungsprozess deutliche Verbesserungen sowohl bei der Schätzung der Parameter als auch bei der Vorhersage der Wahrscheinlichkeiten erzielt werden können. Zudem lassen sich dadurch auch bei fehlenden regionalen Identifikatoren in den Modellierungsdaten kleinräumige Wahrscheinlichkeiten erzeugen. Insbesondere die Maximierung der Likelihood des zugrundeliegenden Regressionsmodells unter der Nebenbedingung, dass die bekannten Totalwerte eingehalten werden, weist in allen Simulationsstudien überaus gute Ergebnisse auf.
Als eine der einflussreichsten Komponenten in regionalisierten Mikrosimulationen erweist sich die Umsetzung regionaler Mobilität. Gleichzeitig finden Wanderungen in vielen Mikrosimulationsmodellen keine oder nur unzureichende Beachtung. Durch den unmittelbaren Einfluss auf die gesamte Bevölkerungsstruktur führt ein Ignorieren jedoch bereits bei einem kurzen Simulationshorizont zu starken Verzerrungen. Während für globale Modelle die Integration von Wanderungsbewegungen über Landesgrenzen ausreicht, müssen in regionalisierten Modellen auch Binnenwanderungsbewegungen möglichst umfassend nachgebildet werden. Zu diesem Zweck werden in Kapitel 4 Konzepte für Wanderungsmodule erstellt, die zum einen eine unabhängige Simulation auf regionalen Subpopulationen und zum anderen eine umfassende Nachbildung von Wanderungsbewegungen innerhalb der gesamten Population zulassen. Um eine Berücksichtigung von Haushaltsstrukturen zu ermöglichen und die Plausibilität der Daten zu gewährleisten, wird ein Algorithmus zur Kalibrierung von Haushaltswahrscheinlichkeiten vorgeschlagen, der die Einhaltung von Benchmarks auf Individualebene ermöglicht. Über die retrospektive Evaluation der simulierten Migrationsbewegungen wird die Funktionalität der Wanderdungskonzepte verdeutlicht. Darüber hinaus werden über die Fortschreibung der Population in zukünftige Perioden divergente Entwicklungen der Einwohnerzahlen durch verschiedene Konzepte der Wanderungen analysiert.
Eine besondere Herausforderung in dynamischen Mikrosimulationen stellt die Erfassung von Unsicherheiten dar. Durch die Komplexität der gesamten Struktur und die Heterogenität der Komponenten ist die Anwendung klassischer Methoden zur Messung von Unsicherheiten oft nicht mehr möglich. Zur Quantifizierung verschiedener Einflussfaktoren werden in Kapitel 5 varianzbasierte Sensitivitätsanalysen vorgeschlagen, die aufgrund ihrer enormen Flexibilität auch direkte Vergleiche zwischen unterschiedlichsten Komponenten ermöglichen. Dabei erweisen sich Sensitivitätsanalysen nicht nur für die Erfassung von Unsicherheiten, sondern auch für die direkte Analyse verschiedener Szenarien, insbesondere zur Evaluation gemeinsamer Effekte, als überaus geeignet. In Simulationsstudien wird die Anwendung im konkreten Kontext dynamischer Modelle veranschaulicht. Dadurch wird deutlich, dass zum einen große Unterschiede hinsichtlich verschiedener Zielwerte und Simulationsperioden auftreten, zum anderen aber auch immer der Grad an regionaler Differenzierung berücksichtigt werden muss.
Kapitel 6 fasst die Erkenntnisse der vorliegenden Arbeit zusammen und gibt einen Ausblick auf zukünftige Forschungspotentiale.
Survey data can be viewed as incomplete or partially missing from a variety of perspectives and there are different ways of dealing with this kind of data in the prediction and the estimation of economic quantities. In this thesis, we present two selected research contexts in which the prediction or estimation of economic quantities is examined under incomplete survey data.
These contexts are first the investigation of composite estimators in the German Microcensus (Chapters 3 and 4) and second extensions of multivariate Fay-Herriot (MFH) models (Chapters 5 and 6), which are applied to small area problems.
Composite estimators are estimation methods that take into account the sample overlap in rotating panel surveys such as the German Microcensus in order to stabilise the estimation of the statistics of interest (e.g. employment statistics). Due to the partial sample overlaps, information from previous samples is only available for some of the respondents, so the data are partially missing.
MFH models are model-based estimation methods that work with aggregated survey data in order to obtain more precise estimation results for small area problems compared to classical estimation methods. In these models, several variables of interest are modelled simultaneously. The survey estimates of these variables, which are used as input in the MFH models, are often partially missing. If the domains of interest are not explicitly accounted for in a sampling design, the sizes of the samples allocated to them can, by chance, be small. As a result, it can happen that either no estimates can be calculated at all or that the estimated values are not published by statistical offices because their variances are too large.
Official business surveys form the basis for national and regional business statistics and are thus of great importance for analysing the state and performance of the economy. However, both the heterogeneity of business data and their high dynamics pose a particular challenge to the feasibility of sampling and the quality of the resulting estimates. A widely used sampling frame for creating the design of an official business survey is an extract from an official business register. However, if this frame does not accurately represent the target population, frame errors arise. Amplified by the heterogeneity and dynamics of business populations, these errors can significantly affect the estimation quality and lead to inefficiencies and biases. This dissertation therefore deals with design-based methods for optimising business surveys with respect to different types of frame errors.
First, methods for adjusting the sampling design of business surveys are addressed. These approaches integrate auxiliary information about the expected structures of frame errors into the sampling design. The aim is to increase the number of sampled businesses that are subject to frame errors. The element-specific frame error probability is estimated based on auxiliary information about frame errors observed in previous samples. The approaches discussed consider different types of frame errors and can be incorporated into predefined designs with fixed strata.
As the second main pillar of this work, methods for adjusting weights to correct for frame errors during estimation are developed and investigated. As a result of frame errors, the assumptions under which the original design weights were determined based on the sampling design no longer hold. The developed methods correct the design weights taking into account the errors identified for sampled elements. Case-number-based reweighting approaches, on the one hand, attempt to reconstruct the unknown size of the individual strata in the target population. In the context of weight smoothing methods, on the other hand, design weights are modelled and smoothed as a function of target or auxiliary variables. This serves to avoid inefficiencies in the estimation due to highly scattering weights or weak correlations between weights and target variables. In addition, possibilities of correcting frame errors by calibration weighting are elaborated. Especially when the sampling frame shows over- and/or undercoverage, the inclusion of external auxiliary information can provide a significant improvement of the estimation quality. For those methods whose quality cannot be measured using standard procedures, a procedure for estimating the variance based on a rescaling bootstrap is proposed. This enables an assessment of the estimation quality when using the methods in practice.
In the context of two extensive simulation studies, the methods presented in this dissertation are evaluated and compared with each other. First, in the environment of an experimental simulation, it is assessed which approaches are particularly suitable with regard to different data situations. In a second simulation study, which is based on the structural survey in the services sector, the applicability of the methods in practice is evaluated under realistic conditions.
Income is one of the key indicators to measure regional differences, individual opportunities, and inequalities in society. In Germany, the regional distribution of income is a central concern, especially regarding persistent East-West, North-South, or urban-rural inequalities.
Effective local policies and institutions require reliable data and indicators on
regional inequality. However, its measurement faces severe data limitations: Inconsistencies
in the existing microdata sources yield an inconclusive picture of regional inequality.
While survey data provide a wide range of individual and household information but lack top incomes, tax data contain the most reliable income records but offer a limited range of socio-demographic variables essential for income analysis. In addition, information on the
long-term evolution of the income distribution at the small-scale level is scarce.
In this context, this thesis evaluates regional income inequality in Germany from various perspectives and embeds three self-contained studies in Chapters 3, 4, and 5, which present different data integration approaches. The first chapter motivates this thesis, while the second chapter provides a brief overview of the theoretical and empirical concepts as well
as the datasets, highlighting the need to combine data from different sources.
Chapter 3 tackles the issue of poor coverage of top incomes in surveys, also referred to as the ’missing rich’ problem, which leads to severe underestimation of income inequality. At the regional level this shortcoming is even more eminent due to small regional sample sizes. Based on reconciled tax and survey data, this chapter therefore proposes a new multiple
imputation top income correction approach that, unlike previous research, focuses on the regional rather than the national level. The findings indicate that inequality between and within the regions is much larger than previously understood with the magnitude of the adjustment depending on the federal states’ level of inequality in the tail. To increase the potential of the tax data for income analysis and to overcome the lack
of socio-demographic characteristics, Chapter 4 enriches the tax data with information on education and working time from survey data. For that purpose, a simulation study evaluates missing data methods and performant prediction models, finding that Multinomial
Regression and Random Forest are the most suitable methods for the specific data fusion scenario. The results indicate that data fusion approaches broaden the scope for regional inequality analysis from cross-sectional enhanced tax data.
Shifting from a cross-sectional to a longitudinal perspective on regional income inequality, Chapter 5 contributes to the currently relatively small body of literature dealing with the potential development of regional income disparities over time. Regionalized dynamic microsimulations provide a powerful tool for the study of long-term income developments. Therefore, this chapter extends the microsimulation model MikroSim with an income module
that accounts for the individual, household, and regional context. On this basis, the potential dynamics in gender and migrant income gaps across the districts in Germany are simulated under scenarios of increased full-time employment rates and higher levels
of tertiary education. The results show that the scenarios have regionally differing effects on inequality dynamics, highlighting the considerable potential of dynamic microsimulations for regional evidence-based policies. For the German case, the MikroSim model is well suited to analyze future regional developments and can be flexibly adapted for further specific research questions.
The publication of statistical databases is subject to legal regulations, e.g. national statistical offices are only allowed to publish data if the data cannot be attributed to individuals. Achieving this privacy standard requires anonymizing the data prior to publication. However, data anonymization inevitably leads to a loss of information, which should be kept minimal. In this thesis, we analyze the anonymization method SAFE used in the German census in 2011 and we propose a novel integer programming-based anonymization method for nominal data.
In the first part of this thesis, we prove that a fundamental variant of the underlying SAFE optimization problem is NP-hard. This justifies the use of heuristic approaches for large data sets. In the second part, we propose a new anonymization method belonging to microaggregation methods, specifically designed for nominal data. This microaggregation method replaces rows in a microdata set with representative values to achieve k-anonymity, ensuring each data row is identical to at least k − 1 other rows. In addition to the overall dissimilarities of the data rows, the method accounts for errors in resulting frequency tables, which are of high interest for nominal data in practice. The method employs a typical two-step structure: initially partitioning the data set into clusters and subsequently replacing all cluster elements with representative values to achieve k-anonymity. For the partitioning step, we propose a column generation scheme followed by a heuristic to obtain an integer solution, which is based on the dual information. For the aggregation step, we present a mixed-integer problem formulation to find cluster representatives. To this end, we take errors in a subset of frequency tables into account. Furthermore, we show a reformulation of the problem to a minimum edge-weighted maximal clique problem in a multipartite graph, which allows for a different perspective on the problem. Moreover, we formulate a mixed-integer program, which combines the partitioning and the aggregation step and aims to minimize the sum of chi-squared errors in frequency tables.
Finally, an experimental study comparing the methods covered or developed in this work shows particularly strong results for the proposed method with respect to relative criteria, while SAFE shows its strength with respect to the maximum absolute error in frequency tables. We conclude that the inclusion of integer programming in the context of data anonymization is a promising direction to reduce the inevitable information loss inherent in anonymization, particularly for nominal data.
Data fusions are becoming increasingly relevant in official statistics. The aim of a data fusion is to combine two or more data sources using statistical methods in order to be able to analyse different characteristics that were not jointly observed in one data source. Record linkage of official data sources using unique identifiers is often not possible due to methodological and legal restrictions. Appropriate data fusion methods are therefore of central importance in order to use the diverse data sources of official statistics more effectively and to be able to jointly analyse different characteristics. However, the literature lacks comprehensive evaluations of which fusion approaches provide promising results for which data constellations. Therefore, the central aim of this thesis is to evaluate a concrete plethora of possible fusion algorithms, which includes classical imputation approaches as well as statistical and machine learning methods, in selected data constellations.
To specify and identify these data contexts, data and imputation-related scenario types of a data fusion are introduced: Explicit scenarios, implicit scenarios and imputation scenarios. From these three scenario types, fusion scenarios that are particularly relevant for official statistics are selected as the basis for the simulations and evaluations. The explicit scenarios are the fulfilment or violation of the Conditional Independence Assumption (CIA) and varying sample sizes of the data to be matched. Both aspects are likely to have a direct, that is, explicit, effect on the performance of different fusion methods. The summed sample size of the data sources to be fused and the scale level of the variable to be imputed are considered as implicit scenarios. Both aspects suggest or exclude the applicability of certain fusion methods due to the nature of the data. The univariate or simultaneous, multivariate imputation solution and the imputation of artificially generated or previously observed values in the case of metric characteristics serve as imputation scenarios.
With regard to the concrete plethora of possible fusion algorithms, three classical imputation approaches are considered: Distance Hot Deck (DHD), the Regression Model (RM) and Predictive Mean Matching (PMM). With Decision Trees (DT) and Random Forest (RF), two prominent tree-based methods from the field of statistical learning are discussed in the context of data fusion. However, such prediction methods aim to predict individual values as accurately as possible, which can clash with the primary objective of data fusion, namely the reproduction of joint distributions. In addition, DT and RF only comprise univariate imputation solutions and, in the case of metric variables, artificially generated values are imputed instead of real observed values. Therefore, Predictive Value Matching (PVM) is introduced as a new, statistical learning-based nearest neighbour method, which could overcome the distributional disadvantages of DT and RF, offers a univariate and multivariate imputation solution and, in addition, imputes real and previously observed values for metric characteristics. All prediction methods can form the basis of the new PVM approach. In this thesis, PVM based on Decision Trees (PVM-DT) and Random Forest (PVM-RF) is considered.
The underlying fusion methods are investigated in comprehensive simulations and evaluations. The evaluation of the various data fusion techniques focusses on the selected fusion scenarios. The basis for this is formed by two concrete and current use cases of data fusion in official statistics, the fusion of EU-SILC and the Household Budget Survey on the one hand and of the Tax Statistics and the Microcensus on the other. Both use cases show significant differences with regard to different fusion scenarios and thus serve the purpose of covering a variety of data constellations. Simulation designs are developed from both use cases, whereby the explicit scenarios in particular are incorporated into the simulations.
The results show that PVM-RF in particular is a promising and universal fusion approach under compliance with the CIA. This is because PVM-RF provides satisfactory results for both categorical and metric variables to be imputed and also offers a univariate and multivariate imputation solution, regardless of the scale level. PMM also represents an adequate fusion method, but only in relation to metric characteristics. The results also imply that the application of statistical learning methods is both an opportunity and a risk. In the case of CIA violation, potential correlation-related exaggeration effects of DT and RF, and in some cases also of RM, can be useful. In contrast, the other methods induce poor results if the CIA is violated. However, if the CIA is fulfilled, there is a risk that the prediction methods RM, DT and RF will overestimate correlations. The size ratios of the studies to be fused in turn have a rather minor influence on the performance of fusion methods. This is an important indication that the larger dataset does not necessarily have to serve as a donor study, as was previously the case.
The results of the simulations and evaluations provide concrete implications as to which data fusion methods should be used and considered under the selected data and imputation constellations. Science in general and official statistics in particular benefit from these implications. This is because they provide important indications for future data fusion projects in order to assess which specific data fusion method could provide adequate results along the data constellations analysed in this thesis. Furthermore, with PVM this thesis offers a promising methodological innovation for future data fusions and for imputation problems in general.
In most textbooks optimal sample allocation is tailored to rather theoretical examples. However, in practice we often face large-scale surveys with conflicting objectives and many restrictions on the quality and cost at population and subpopulation levels. This multiobjectiveness results in a multitude of efficient sample allocations, each giving different weight to a single survey purpose. Additionally, since the input data to the allocation problem often relies on supplementary information derived from estimation, historical data, or expert knowledge, allocations might be inefficient when specified for sampling.
This doctoral thesis presents a framework for optimal allocation to standard sampling schemes that allows for specifying the tradeoff between different objectives and analyzing their sensitivity to other problem components, aiming to support a decision-maker in identifying an at-most preferred sample allocation. It dedicates a full chapter to each of the following core questions: 1) How to efficiently incorporate quality and cost constraints for large-scale surveys, say, for thousands of strata with hundreds of precision and cost constraints? 2) How to handle vector-valued objectives with their components addressing different, possibly conflicting survey purposes? 3) How to consider uncertainty in the input data?
The techniques presented can be used separately or in combination as a general problem-solving framework for constrained multivariate and multidomain, possibly uncertain, sample allocation. The main problem is formulated in a way that highlights the different components of optimal sample allocation and can be taken as a gateway to develop solution strategies to each of the questions above, while shifting the focus between different problem aspects. The first question is addressed through a conic quadratic reformulation, which can be efficiently solved for large problem instances using interior-point methods. Based on this the second question is tackled using a weighted Chebyshev minimization, which provides insight into the sensitivity of the problem and enables a stepwise procedure for considering nonlinear decision functionals. Lastly, uncertainty in the input data is addressed through regularization, chance constraints and robust problem formulations.
There is a wide range of methodologies for policy evaluation and socio-economic impact assessment. A fundamental distinction can be made between micro and macro approaches. In contrast to micro models, which focus on the micro-unit, macro models are used to analyze aggregate variables. The ability of microsimulation models to capture interactions occurring at the micro-level makes them particularly suitable for modeling complex real-world phenomena. The inclusion of a behavioral component into microsimulation models provides a framework for assessing the behavioral effects of policy changes.
The labor market is a primary area of interest for both economists and policy makers. The projection of labor-related variables is particularly important for assessing economic and social development needs, as it provides insight into the potential trajectory of these variables and can be used to design effective policy responses. As a result, the analysis of labor market behavior is a primary area of application for behavioral microsimulation models. Behavioral microsimulation models allow for the study of second-round effects, including changes in hours worked and participation rates resulting from policy reforms. It is important to note, however, that most microsimulation models do not consider the demand side of the labor market.
The combination of micro and macro models offers a possible solution as it constitutes a promising way to integrate the strengths of both models. Of particular relevance is the combination of microsimulation models with general equilibrium models, especially computable general equilibrium (CGE) models. CGE models are classified as structural macroeconomic models, which are defined by their basis in economic theory. Another important category of macroeconomic models are time series models. This thesis examines the potential for linking micro and macro models. The different types of microsimulation models are presented, with special emphasis on discrete-time dynamic microsimulation models. The concept of behavioral microsimulation is introduced to demonstrate the integration of a behavioral element into microsimulation models. For this reason, the concept of utility is introduced and the random utility approach is described in detail. In addition, a brief overview of macro models is given with a focus on general equilibrium models and time series models. Various approaches for linking micro and macro models, which can either be categorized as sequential approaches or integrated approaches, are presented. Furthermore, the concept of link variables is introduced, which play a central role in combining both models. The focus is on the most complex sequential approach, i.e., the bi-directional linking of behavioral microsimulation models with general equilibrium macro models.
Measuring the economic activity of a country requires high-quality data of businesses. In the case of Germany, this is not only required at national level, but also at federal state level and for different economic sectors. Important sources for high-quality business data are the business register and, among others, also 14 business surveys which are conducted by the Federal Statistical Office of Germany. However, the quality requirements of the Federal Statistical Office are in contrast to the interests of the businesses themselves. For them, answering to a survey's questionnaire is an additional cost factor, also known as response burden. A high response burden should be avoided, since it can have a negative impact on the quality of the businesses' responses to the surveys. Therefore, sample coordination can be used as a method to control the distribution of response burden while securing high-quality data.
When applying already existing business survey coordination systems, developed by different statistical institutes, legal and administrative standards of German official statistics have to be taken into account. These standards consider different sampling fractions, rotation fractions, periodicity, and stratification of the aforementioned 14 business surveys. Therefore, the aim of this doctoral thesis is to check the existing business survey coordination systems for their applicability in the context of German official statistics and, if necessary, to modify them accordingly. These modifications include the introduction of individual burden indicators which aim to take the individual perception of response burden into account.
For this purpose, several synthetic data sets have been created to test the application of the modified versions of the different business survey coordination systems through Monte Carlo simulation studies. These data sets include a large panel data set, reflecting the landscape of businesses in Rhineland-Palatinate and three smaller, synthetic data sets. The latter have been created with the help of the R package BuSuCo which has been developed within the scope of this thesis. The above mentioned simulation studies are evaluated based on different measures for estimation quality as well as for the concentration and distribution of response burden.