OPUS 4 | Suchen

A comparative view on statistical matching (2022)

Borsi, Lisa

Statistical matching offers a way to broaden the scope of analysis without increasing respondent burden and costs. These would result from conducting a new survey or adding variables to an existing one. Statistical matching aims at combining two datasets A and B referring to the same target population in order to analyse variables, say Y and Z, together, that initially were not jointly observed. The matching is performed based on matching variables X that correspond to common variables present in both datasets A and B. Furthermore, Y is only observed in B and Z is only observed in A. To overcome the fact that no joint information on X, Y and Z is available, statistical matching procedures have to rely on suitable assumptions. Therefore, to yield a theoretical foundation for statistical matching, most procedures rely on the conditional independence assumption (CIA), i.e. given X, Y is independent of Z. The goal of this thesis is to encompass both the statistical matching process and the analysis of the matched dataset. More specifically, the aim is to estimate a linear regression model for Z given Y and possibly other covariates in data A. Since the validity of the assumptions underlying the matching process determine the validity of the obtained matched file, the accuracy of statistical inference is determined by the suitability of the assumptions. By putting the focus on these assumptions, this work proposes a systematic categorisation of approaches to statistical matching by relying on graphical representations in form of directed acyclic graphs. These graphs are particularly useful in representing dependencies and independencies which are at the heart of the statistical matching problem. The proposed categorisation distinguishes between (a) joint modelling of the matching and the analysis (integrated approach), and (b) matching subsequently followed by statistical analysis of the matched dataset (classical approach). Whereas the classical approach relies on the CIA, implementations of the integrated approach are only valid if they converge, i.e. if the specified models are identifiable and, in the case of MCMC implementations, if the algorithm converges to a proper distribution. In this thesis an implementation of the integrated approach is proposed, where the imputation step and the estimation step are jointly modelled through a fully Bayesian MCMC estimation. It is based on a linear regression model for Z given Y and accounts for both a linear regression model and a random effects model for Y. Furthermore, it yields its validity when the instrumental variable assumption (IVA) holds. The IVA corresponds to: (a) Z is independent of a subset X’ of X given Y and X*, where X* = X\X’ and (b) Y is correlated with X’ given X*. The proof, that the joint Bayesian modelling of both the model for Z and the model for Y through an MCMC simulation converges to a proper distribution is provided in this thesis. In a first model-based simulation study, the proposed integrated Bayesian procedure is assessed with regard to the data situation, convergence issues, and underlying assumptions. Special interest lies in the investigation of the interplay of the Y and the Z model within the imputation process. It turns out that failure scenarios can be distinguished by comparing the CIA and the IVA in the completely observed dataset. Finally, both approaches to statistical matching, i.e. the classical approach and the integrated approach, are subject to an extensive comparison in (1) a model-based simulation study and (2) a simulation study based on the AMELIA dataset, which is an openly available very large synthetic dataset and, by construction, similar to the EU-SILC survey. As an additional integrated approach, a Bayesian additive regression trees (BART) model is considered for modelling Y. These integrated procedures are compared to the classical approach represented by predictive mean matching in the form of multiple imputations by chained equation. Suitably chosen, the first simulation framework offers the possibility to clarify aspects related to the underlying assumptions by comparing the IVA and the CIA and by evaluating the impact of the matching variables. Thus, within this simulation study two related aspects are of special interest: the assumptions underlying each method and the incorporation of additional matching variables. The simulation on the AMELIA dataset offers a close-to-reality framework with the advantage of knowing the whole setting, i.e. the whole data X, Y and Z. Special interest lies in investigating assumptions through adding and excluding auxiliary variables in order to enhance conditional independence and assess the sensitivity of the methods to this issue. Furthermore, the benefit of having an overlap of units in data A and B for which information on X, Y, Z is available is investigated. It turns out that the integrated approach yields better results than the classical approach when the CIA clearly does not hold. Moreover, even when the classical approach obtains unbiased results for the regression coefficient of Y in the model for Z, it is the method relying on BART that over all coefficients performs best. Concluding, this work constitutes a major contribution to the clarification of assumptions essential to any statistical matching procedure. By introducing graphical models to identify existing approaches to statistical matching combined with the subsequent analysis of the matched dataset, it offers an extensive overview, categorisation and extension of theory and application. Furthermore, in a setting where none of the assumptions are testable (since X, Y and Z are not observed together), the integrated approach is a valuable asset by offering an alternative to the CIA.

A model-based temperature adjustment scheme for wintertime sea-ice production retrievals from MODIS (2022)

Preußer, Andreas ; Heinemann, Günther ; Schefczyk, Lukas ; Willmes, Sascha

Knowledge of the wintertime sea-ice production in Arctic polynyas is an important requirement for estimations of the dense water formation, which drives vertical mixing in the upper ocean. Satellite-based techniques incorporating relatively high resolution thermal-infrared data from MODIS in combination with atmospheric reanalysis data have proven to be a strong tool to monitor large and regularly forming polynyas and to resolve narrow thin-ice areas (i.e., leads) along the shelf-breaks and across the entire Arctic Ocean. However, the selection of the atmospheric data sets has a large influence on derived polynya characteristics due to their impact on the calculation of the heat loss to the atmosphere, which is determined by the local thin-ice thickness. In order to overcome this methodical ambiguity, we present a MODIS-assisted temperature adjustment (MATA) algorithm that yields corrections of the 2 m air temperature and hence decreases differences between the atmospheric input data sets. The adjustment algorithm is based on atmospheric model simulations. We focus on the Laptev Sea region for detailed case studies on the developed algorithm and present time series of polynya characteristics in the winter season 2019/2020. It shows that the application of the empirically derived correction decreases the difference between different utilized atmospheric products significantly from 49% to 23%. Additional filter strategies are applied that aim at increasing the capability to include leads in the quasi-daily and persistence-filtered thin-ice thickness composites. More generally, the winter of 2019/2020 features high polynya activity in the eastern Arctic and less activity in the Canadian Arctic Archipelago, presumably as a result of the particularly strong polar vortex in early 2020.

A near-natural experiment on factors influencing larval drift in Salamandra salamandra (2022)

Schafft, Malwina ; Wagner, Norman ; Schütz, Tobias ; Veith, Michael

The larval stage of the European fire salamander (Salamandra salamandra) inhabits both lentic and lotic habitats. In the latter, they are constantly exposed to unidirectional water flow, which has been shown to cause downstream drift in a variety of taxa. In this study, a closed artificial creek, which allowed us to keep the water flow constant over time and, at the same time, to simulates with predefined water quantities and durations, was used to examine the individual movement patterns of marked larval fire salamanders exposed to unidirectional flow. Movements were tracked by marking the larvae with VIAlpha tags individually and by using downstream and upstream traps. Most individuals showed stationarity, while downstream drift dominated the overall movement pattern. Upstream movements were rare and occurred only on small distances of about 30 cm; downstream drift distances exceeded 10 m (until next downstream trap). The simulated flood events increased drift rates significantly, even several days after the flood simulation experiments. Drift probability increased with decreasing body size and decreasing nutritional status. Our results support the production hypothesis as an explanation for the movements of European fire salamander larvae within creeks.

A Three-Year Climatology of the Wind Field Structure at Cape Baranova (Severnaya Zemlya, Siberia) from SODAR Observations and High-Resolution Regional Climate Model Simulations during YOPP (2022)

Heinemann, Günther ; Drüe, Clemens ; Makshtas, Alexander

Measurements of the atmospheric boundary layer (ABL) structure were performed for three years (October 2017–August 2020) at the Russian observatory “Ice Base Cape Baranova” (79.280° N, 101.620° E) using SODAR (Sound Detection And Ranging). These measurements were part of the YOPP (Year of Polar Prediction) project “Boundary layer measurements in the high Arctic” (CATS_BL) within the scope of a joint German–Russian project. In addition to SODAR-derived vertical profiles of wind speed and direction, a suite of complementary measurements at the observatory was available. ABL measurements were used for verification of the regional climate model COSMO-CLM (CCLM) with a 5 km resolution for 2017–2020. The CCLM was run with nesting in ERA5 data in a forecast mode for the measurement period. SODAR measurements were mostly limited to wind speeds <12 m/s since the signal was often lost for higher winds. The SODAR data showed a topographical channeling effect for the wind field in the lowest 100 m and some low-level jets (LLJs). The verification of the CCLM with near-surface data of the observatory showed good agreement for the wind and a negative bias for the 2 m temperature. The comparison with SODAR data showed a positive bias for the wind speed of about 1 m/s below 100 m, which increased to 1.5 m/s for higher levels. In contrast to the SODAR data, the CCLM data showed the frequent presence of LLJs associated with the topographic channeling in Shokalsky Strait. Although SODAR wind profiles are limited in range and have a lot of gaps, they represent a valuable data set for model verification. However, a full picture of the ABL structure and the climatology of channeling events could be obtained only with the model data. The climatological evaluation showed that the wind field at Cape Baranova was not only influenced by direct topographic channeling under conditions of southerly winds through the Shokalsky Strait but also by channeling through a mountain gap for westerly winds. LLJs were detected in 37% of all profiles and most LLJs were associated with channeling, particularly LLJs with a jet speed ≥ 15 m/s (which were 29% of all LLJs). The analysis of the simulated 10 m wind field showed that the 99%-tile of the wind speed reached 18 m/s and clearly showed a dipole structure of channeled wind at both exits of Shokalsky Strait. The climatology of channeling events showed that this dipole structure was caused by the frequent occurrence of channeling at both exits. Channeling events lasting at least 12 h occurred on about 62 days per year at both exits of Shokalsky Strait.

Analysis of changes in the barriers to cross-border educational projects – the COVID-19 pandemic effect (2022)

Kurowska-Pysz, Joanna

The paper aims to recognize the changes in the barriers to cross-border educational projects, especially in the context of the COVID-19 pandemic. The research focused on the European borderlands, where the level of maturity of cross-border cooperation is diverse (the Franco-German and Polish-Czech bor-derlands). The author utilised qualitative research methods (desk research, in-depth interview, case study). An exploratory study covered the barriers existing before the pandemic that stayed stable or have changed during the pandemic, and the new types of barriers that have appeared then. Within both borderlands, the identified barriers were similar in general; however, their intensity was varied. The key difference was the approach to these barriers within each borderland. On the Franco-German border, cross-border cooperation is more complex and deeper, and on the Polish-Czech border, it is more su-perficial and focused on specific issues only. These differences reveal the solutions that should be im-plemented to mitigate the impact of the pandemic on those projects within each borderland.

Can People Intentionally and Selectively Forget Prose Material? (2022)

Pastötter, Bernhard ; Haciahmet, Céline C.

List-method directed forgetting (LMDF) is the demonstration that people can intentionally forget previously studied information when they are asked to forget what they have previously learned and remember new information instead. In addition, recent research demonstrated that people can selectively forget when cued to forget only a subset of the previously studied information. Both forms of forgetting are typically observed in recall tests, in which the to-be-forgotten and to-be-remembered information is tested independent of original cuing. Thereby, both LMDF and selective directed forgetting (SDF) have been studied mostly with unrelated item materials (e.g., word lists). The present study examined whether LMDF and SDF generalize to prose material. Participants learned three prose passages, which they were cued to remember or forget after the study of each passage. At the time of testing, participants were asked to recall the three prose passages regardless of original cuing. The results showed no significant differences in recall of the three lists as a function of cuing condition. The findings suggest that LMDF and SDF do not occur with prose material. Future research is needed to replicate and extend these findings with (other) complex and meaningful materials before drawing firm conclusions. If the null effect proves to be robust, this would have implications regarding the ecological validity and generalizability of current LMDF and SDF findings.

Computational Techniques for Minimum Sum-of-Squares Clustering, Cardinality-Constrained Optimization, and Robust Clustering Problems (2022)

Moreira Costa, Carina

This thesis is concerned with two classes of optimization problems which stem mainly from statistics: clustering problems and cardinality-constrained optimization problems. We are particularly interested in the development of computational techniques to exactly or heuristically solve instances of these two classes of optimization problems. The minimum sum-of-squares clustering (MSSC) problem is widely used to find clusters within a set of data points. The problem is also known as the $k$-means problem, since the most prominent heuristic to compute a feasible point of this optimization problem is the $k$-means method. In many modern applications, however, the clustering suffers from uncertain input data due to, e.g., unstructured measurement errors. The reason for this is that the clustering result then represents a clustering of the erroneous measurements instead of retrieving the true underlying clustering structure. We address this issue by applying robust optimization techniques: we derive the strictly and $\Gamma$-robust counterparts of the MSSC problem, which are as challenging to solve as the original model. Moreover, we develop alternating direction methods to quickly compute feasible points of good quality. Our experiments reveal that the more conservative strictly robust model consistently provides better clustering solutions than the nominal and the less conservative $\Gamma$-robust models. In the context of clustering problems, however, using only a heuristic solution comes with severe disadvantages regarding the interpretation of the clustering. This motivates us to study globally optimal algorithms for the MSSC problem. We note that although some algorithms have already been proposed for this problem, it is still far from being “practically solved”. Therefore, we propose mixed-integer programming techniques, which are mainly based on geometric ideas and which can be incorporated in a branch-and-cut based algorithm tailored to the MSSC problem. Our numerical experiments show that these techniques significantly improve the solution process of a state-of-the-art MINLP solver when applied to the problem. We then turn to the study of cardinality-constrained optimization problems. We consider two famous problem instances of this class: sparse portfolio optimization and sparse regression problems. In many modern applications, it is common to consider problems with thousands of variables. Therefore, globally optimal algorithms are not always computationally viable and the study of sophisticated heuristics is very desirable. Since these problems have a discrete-continuous structure, decomposition methods are particularly well suited. We then apply a penalty alternating direction method that explores this structure and provides very good feasible points in a reasonable amount of time. Our computational study shows that our methods are competitive to state-of-the-art solvers and heuristics.

Content of soil organic carbon and labile fractions depend on local combinations of mineral-phase characteristics (2022)

Ortner, Malte ; Seidel, Michael ; Semella, Sebastian ; Udelhoven, Thomas ; Vohland, Michael ; Thiele-Bruhn, Sören

Soil organic matter (SOM) is an indispensable component of terrestrial ecosystems. Soil organic carbon (SOC) dynamics are influenced by a number of well-known abiotic factors such as clay content, soil pH, or pedogenic oxides. These parameters interact with each other and vary in their influence on SOC depending on local conditions. To investigate the latter, the dependence of SOC accumulation on parameters and parameter combinations was statistically assessed that vary on a local scale depending on parent material, soil texture class, and land use. To this end, topsoils were sampled from arable and grassland sites in south-western Germany in four regions with different soil parent material. Principal component analysis (PCA) revealed a distinct clustering of data according to parent material and soil texture that varied largely between the local sampling regions, while land use explained PCA results only to a small extent. The PCA clusters were differentiated into total clusters that contain the entire dataset or major proportions of it and local clusters representing only a smaller part of the dataset. All clusters were analysed for the relationships between SOC concentrations (SOC %) and mineral-phase parameters in order to assess specific parameter combinations explaining SOC and its labile fractions hot water-extractable C (HWEC) and microbial biomass C (MBC). Analyses were focused on soil parameters that are known as possible predictors for the occurrence and stabilization of SOC (e.g. fine silt plus clay and pedogenic oxides). Regarding the total clusters, we found significant relationships, by bivariate models, between SOC, its labile fractions HWEC and MBC, and the applied predictors. However, partly low explained variances indicated the limited suitability of bivariate models. Hence, mixed-effect models were used to identify specific parameter combinations that significantly explain SOC and its labile fractions of the different clusters. Comparing measured and mixed-effect-model-predicted SOC values revealed acceptable to very good regression coefficients (R2=0.41–0.91) and low to acceptable root mean square error (RMSE = 0.20 %–0.42 %). Thereby, the predictors and predictor combinations clearly differed between models obtained for the whole dataset and the different cluster groups. At a local scale, site-specific combinations of parameters explained the variability of organic carbon notably better, while the application of total models to local clusters resulted in less explained variance and a higher RMSE. Independently of that, the explained variance by marginal fixed effects decreased in the order SOC > HWEC > MBC, showing that labile fractions depend less on soil properties but presumably more on processes such as organic carbon input and turnover in soil.

Cumulative Meta-Analysis. Robustness of Evidence in Survey Methodology (2022)

Burgard, Tanja

Surveys play a major role in studying social and behavioral phenomena that are difficult to observe. Survey data provide insights into the determinants and consequences of human behavior and social interactions. Many domains rely on high quality survey data for decision making and policy implementation including politics, health, business, and the social sciences. Given a certain research question in a specific context, finding the most appropriate survey design to ensure data quality and keep fieldwork costs low at the same time is a difficult task. The aim of examining survey research methodology is to provide the best evidence to estimate the costs and errors of different survey design options. The goal of this thesis is to support and optimize the accumulation and sustainable use of evidence in survey methodology in four steps: (1) Identifying the gaps in meta-analytic evidence in survey methodology by a systematic review of the existing evidence along the dimensions of a central framework in the field (2) Filling in these gaps with two meta-analyses in the field of survey methodology, one on response rates in psychological online surveys, the other on panel conditioning effects for sensitive items (3) Assessing the robustness and sufficiency of the results of the two meta-analyses (4) Proposing a publication format for the accumulation and dissemination of metaanalytic evidence

Detection of Individual Tree Stems Using ALS and its Potential for Forest Research (2022)

Lamprecht, Sebastian ; Hill, Andreas ; Stoffels, Johannes ; Dotzler, Sandra ; Erik, Haß ; Udelhoven, Thomas

Forest inventories provide significant monitoring information on forest health, biodiversity, resilience against disturbance, as well as its biomass and timber harvesting potential. For this purpose, modern inventories increasingly exploit the advantages of airborne laser scanning (ALS) and terrestrial laser scanning (TLS). Although tree crown detection and delineation using ALS can be seen as a mature discipline, the identification of individual stems is a rarely addressed task. In particular, the informative value of the stem attributes—especially the inclination characteristics—is hardly known. In addition, a lack of tools for the processing and fusion of forest-related data sources can be identified. The given thesis addresses these research gaps in four peer-reviewed papers, while a focus is set on the suitability of ALS data for the detection and analysis of tree stems. In addition to providing a novel post-processing strategy for geo-referencing forest inventory plots, the thesis could show that ALS-based stem detections are very reliable and their positions are accurate. In particular, the stems have shown to be suited to study prevailing trunk inclination angles and orientations, while a species-specific down-slope inclination of the tree stems and a leeward orientation of conifers could be observed.

Filtern

Autor

Erscheinungsjahr

Dokumenttyp

Sprache

Schlagworte

Institut

48 Treffer