Refine
Year of publication
- 2020 (2) (remove)
Keywords
- Datenerhebung (1)
- Density Estimation (1)
- Generalized Variance Functions (1)
- Navier-Stokes-Gleichung (1)
- Schätzung (1)
- Self-organizing Maps (1)
- Surveys (1)
- Unternehmen (1)
- Weighted Regression (1)
- asymptotic analysis (1)
Estimation and therefore prediction -- both in traditional statistics and machine learning -- encounters often problems when done on survey data, i.e. on data gathered from a random subset of a finite population. Additional to the stochastic generation of the data in the finite population (based on a superpopulation model), the subsetting represents a second randomization process, and adds further noise to the estimation. The character and impact of the additional noise on the estimation procedure depends on the specific probability law for subsetting, i.e. the survey design. Especially when the design is complex or the population data is not generated by a Gaussian distribution, established methods must be re-thought. Both phenomena can be found in business surveys, and their combined occurrence poses challenges to the estimation.
This work introduces selected topics linked to relevant use cases of business surveys and discusses the role of survey design therein: First, consider micro-econometrics using business surveys. Regression analysis under the peculiarities of non-normal data and complex survey design is discussed. The focus lies on mixed models, which are able to capture unobserved heterogeneity e.g. between economic sectors, when the dependent variable is not conditionally normally distributed. An algorithm for survey-weighted model estimation in this setting is provided and applied to business data.
Second, in official statistics, the classical sampling randomization and estimators for finite population totals are relevant. The variance estimation of estimators for (finite) population totals plays a major role in this framework in order to decide on the reliability of survey data. When the survey design is complex, and the number of variables is large for which an estimated total is required, generalized variance functions are popular for variance estimation. They allow to circumvent cumbersome theoretical design-based variance formulae or computer-intensive resampling. A synthesis of the superpopulation-based motivation and the survey framework is elaborated. To the author's knowledge, such a synthesis is studied for the first time both theoretically and empirically.
Third, the self-organizing map -- an unsupervised machine learning algorithm for data visualization, clustering and even probability estimation -- is introduced. A link to Markov random fields is outlined, which to the author's knowledge has not yet been established, and a density estimator is derived. The latter is evaluated in terms of a Monte-Carlo simulation and then applied to real world business data.
This work studies typical mathematical challenges occurring in the modeling and simulation of manufacturing processes of paper or industrial textiles. In particular, we consider three topics: approximate models for the motion of small inertial particles in an incompressible Newtonian fluid, effective macroscopic approximations for a dilute particle suspension contained in a bounded domain accounting for a non-uniform particle distribution and particle inertia, and possibilities for a reduction of computational cost in the simulations of slender elastic fibers moving in a turbulent fluid flow.
We consider the full particle-fluid interface problem given in terms of the Navier-Stokes equations coupled to momentum equations of a small rigid body. By choosing an appropriate asymptotic scaling for the particle-fluid density ratio and using an asymptotic expansion for the solution components, we derive approximations of the original interface problem. The approximate systems differ according to the chosen scaling of the density ratio in their physical behavior allowing the characterization of different inertial regimes.
We extend the asymptotic approach to the case of many particles suspended in a Newtonian fluid. Under specific assumptions for the combination of particle size and particle number, we derive asymptotic approximations of this system. The approximate systems describe the particle motion which allows to use a mean field approach in order to formulate the continuity equation for the particle probability density function. The coupling of the latter with the approximation for the fluid momentum equation then reveals a macroscopic suspension description which accounts for non-uniform particle distributions in space and for small particle inertia.
A slender fiber in a turbulent air flow can be modeled as a stochastic inextensible one-dimensionally parametrized Kirchhoff beam, i.e., by a stochastic partial differential algebraic equation. Its simulations involve the solution of large non-linear systems of equations by Newton's method. In order to decrease the computational time, we explore different methods for the estimation of the solution. Additionally, we apply smoothing techniques to the Wiener Process in order to regularize the stochastic force driving the fiber, exploring their respective impact on the solution and performance. We also explore the applicability of the Wiener chaos expansion as a solution technique for the simulation of the fiber dynamics.