### Refine

#### Language

- English (13) (remove)

#### Has Fulltext

- yes (13) (remove)

#### Keywords

- Optimierung (4)
- Erhebungsverfahren (2)
- Age Diversity (1)
- Ageing Workforce (1)
- Amtliche Statistik (1)
- Analysis (1)
- Calibration (1)
- Coposititive, Infinite Dimension (1)
- Decision-making behavior (1)
- Discrete optimization (1)

#### Institute

- Fachbereich 4 (13) (remove)

We consider a linear regression model for which we assume that some of the observed variables are irrelevant for the prediction. Including the wrong variables in the statistical model can either lead to the problem of having too little information to properly estimate the statistic of interest, or having too much information and consequently describing fictitious connections. This thesis considers discrete optimization to conduct a variable selection. In light of this, the subset selection regression method is analyzed. The approach gained a lot of interest in recent years due to its promising predictive performance. A major challenge associated with the subset selection regression is the computational difficulty. In this thesis, we propose several improvements for the efficiency of the method. Novel bounds on the coefficients of the subset selection regression are developed, which help to tighten the relaxation of the associated mixed-integer program, which relies on a Big-M formulation. Moreover, a novel mixed-integer linear formulation for the subset selection regression based on a bilevel optimization reformulation is proposed. Finally, it is shown that the perspective formulation of the subset selection regression is equivalent to a state-of-the-art binary formulation. We use this insight to develop novel bounds for the subset selection regression problem, which show to be highly effective in combination with the proposed linear formulation.
In the second part of this thesis, we examine the statistical conception of the subset selection regression and conclude that it is misaligned with its intention. The subset selection regression uses the training error to decide on which variables to select. The approach conducts the validation on the training data, which oftentimes is not a good estimate of the prediction error. Hence, it requires a predetermined cardinality bound. Instead, we propose to select variables with respect to the cross-validation value. The process is formulated as a mixed-integer program with the sparsity becoming subject of the optimization. Usually, a cross-validation is used to select the best model out of a few options. With the proposed program the best model out of all possible models is selected. Since the cross-validation is a much better estimate of the prediction error, the model can select the best sparsity itself.
The thesis is concluded with an extensive simulation study which provides evidence that discrete optimization can be used to produce highly valuable predictive models with the cross-validation subset selection regression almost always producing the best results.

This dissertation deals with consistent estimates in household surveys. Household surveys are often drawn via cluster sampling, with households sampled at the first stage and persons selected at the second stage. The collected data provide information for estimation at both the person and the household level. However, consistent estimates are desirable in the sense that the estimated household-level totals should coincide with the estimated totals obtained at the person-level. Current practice in statistical offices is to use integrated weighting. In this approach consistent estimates are guaranteed by equal weights for all persons within a household and the household itself. However, due to the forced equality of weights, the individual patterns of persons are lost and the heterogeneity within households is not taken into account. In order to avoid the negative consequences of integrated weighting, we propose alternative weighting methods in the first part of this dissertation that ensure both consistent estimates and individual person weights within a household. The underlying idea is to limit the consistency conditions to variables that emerge in both the personal and household data sets. These common variables are included in the person- and household-level estimator as additional auxiliary variables. This achieves consistency more directly and only for the relevant variables, rather than indirectly by forcing equal weights on all persons within a household. Further decisive advantages of the proposed alternative weighting methods are that original individual rather than the constructed aggregated auxiliaries are utilized and that the variable selection process is more flexible because different auxiliary variables can be incorporated in the person-level estimator than in the household-level estimator.
In the second part of this dissertation, the variances of a person-level GREG estimator and an integrated estimator are compared in order to quantify the effects of the consistency requirements in the integrated weighting approach. One of the challenges is that the estimators to be compared are of different dimensions. The proposed solution is to decompose the variance of the integrated estimator into the variance of a reduced GREG estimator, whose underlying model is of the same dimensions as the person-level GREG estimator, and add a constructed term that captures the effects disregarded by the reduced model. Subsequently, further fields of application for the derived decomposition are proposed such as the variable selection process in the field of econometrics or survey statistics.

Many combinatorial optimization problems on finite graphs can be formulated as conic convex programs, e.g. the stable set problem, the maximum clique problem or the maximum cut problem. Especially NP-hard problems can be written as copositive programs. In this case the complexity is moved entirely into the copositivity constraint.
Copositive programming is a quite new topic in optimization. It deals with optimization over the so-called copositive cone, a superset of the positive semidefinite cone, where the quadratic form x^T Ax has to be nonnegative for only the nonnegative vectors x. Its dual cone is the cone of completely positive matrices, which includes all matrices that can be decomposed as a sum of nonnegative symmetric vector-vector-products.
The related optimization problems are linear programs with matrix variables and cone constraints.
However, some optimization problems can be formulated as combinatorial problems on infinite graphs. For example, the kissing number problem can be formulated as a stable set problem on a circle.
In this thesis we will discuss how the theory of copositive optimization can be lifted up to infinite dimension. For some special cases we will give applications in combinatorial optimization.

A basic assumption of standard small area models is that the statistic of interest can be modelled through a linear mixed model with common model parameters for all areas in the study. The model can then be used to stabilize estimation. In some applications, however, there may be different subgroups of areas, with specific relationships between the response variable and auxiliary information. In this case, using a distinct model for each subgroup would be more appropriate than employing one model for all observations. If no suitable natural clustering variable exists, finite mixture regression models may represent a solution that „lets the data decide“ how to partition areas into subgroups. In this framework, a set of two or more different models is specified, and the estimation of subgroup-specific model parameters is performed simultaneously to estimating subgroup identity, or the probability of subgroup identity, for each area. Finite mixture models thus offer a fexible approach to accounting for unobserved heterogeneity. Therefore, in this thesis, finite mixtures of small area models are proposed to account for the existence of latent subgroups of areas in small area estimation. More specifically, it is assumed that the statistic of interest is appropriately modelled by a mixture of K linear mixed models. Both mixtures of standard unit-level and standard area-level models are considered as special cases. The estimation of mixing proportions, area-specific probabilities of subgroup identity and the K sets of model parameters via the EM algorithm for mixtures of mixed models is described. Eventually, a finite mixture small area estimator is formulated as a weighted mean of predictions from model 1 to K, with weights given by the area-specific probabilities of subgroup identity.

This doctoral thesis examines intergenerational knowledge, its antecedents as well as how participation in intergenerational knowledge transfer is related to the performance evaluation of employees. To answer these questions, this doctoral thesis builds on a literature review and quantitative research methods. A systematic literature study shows that empirical evidence on intergenerational knowledge transfer is limited. Building on prior literature, effects of various antecedents at the interpersonal and organizational level regarding their effects on intergenerational and intragenerational knowledge transfer are postulated. By questioning 444 trainees and trainers, this doctoral thesis also demonstrates that interpersonal antecedents impact how trainees participate in intergenerational knowledge transfer with their trainers. Thereby, the results of this study provide support that interpersonal antecedents are relevant for intergenerational knowledge transfer, yet, also emphasize the implications attached to the assigned roles in knowledge transfer (i.e., whether one is a trainee or trainer). Moreover, the results of an experimental vignette study reveal that participation in intergenerational knowledge transfer is linked to the performance evaluation of employees, yet, is susceptible to whether the employee is sharing or seeking knowledge. Overall, this doctoral thesis provides insights into this topic by covering a multitude of antecedents of intergenerational knowledge transfer, as well as how participation in intergenerational knowledge transfer may be associated with the performance evaluation of employees.

Nonlocal operators are used in a wide variety of models and applications due to many natural phenomena being driven by nonlocal dynamics. Nonlocal operators are integral operators allowing for interactions between two distinct points in space. The nonlocal models investigated in this thesis involve kernels that are assumed to have a finite range of nonlocal interactions. Kernels of this type are used in nonlocal elasticity and convection-diffusion models as well as finance and image analysis. Also within the mathematical theory they arouse great interest, as they are asymptotically related to fractional and classical differential equations.
The results in this thesis can be grouped according to the following three aspects: modeling and analysis, discretization and optimization.
Mathematical models demonstrate their true usefulness when put into numerical practice. For computational purposes, it is important that the support of the kernel is clearly determined. Therefore nonlocal interactions are typically assumed to occur within an Euclidean ball of finite radius. In this thesis we consider more general interaction sets including norm induced balls as special cases and extend established results about well-posedness and asymptotic limits.
The discretization of integral equations is a challenging endeavor. Especially kernels which are truncated by Euclidean balls require carefully designed quadrature rules for the implementation of efficient finite element codes. In this thesis we investigate the computational benefits of polyhedral interaction sets as well as geometrically approximated interaction sets. In addition to that we outline the computational advantages of sufficiently structured problem settings.
Shape optimization methods have been proven useful for identifying interfaces in models governed by partial differential equations. Here we consider a class of shape optimization problems constrained by nonlocal equations which involve interface-dependent kernels. We derive the shape derivative associated to the nonlocal system model and solve the problem by established numerical techniques.

This dissertation is dedicated to the analysis of the stabilty of portfolio risk and the impact of European regulation introducing risk based classifications for investment funds.
The first paper examines the relationship between portfolio size and the stability of mutual fund risk measures, presenting evidence for economies of scale in risk management. In a unique sample of 338 fund portfolios we find that the volatility of risk numbers decreases for larger funds. This finding holds for dispersion as well as tail risk measures. Further analyses across asset classes provide evidence for the robustness of the effect for balanced and fixed income portfolios. However, a size effect did not emerge for equity funds, suggesting that equity fund managers simply scale their strategy up as they grow. Analyses conducted on the differences in risk stability between tail risk measures and volatilities reveal that smaller funds show higher discrepancies in that respect. In contrast to the majority of prior studies on the basis of ex-post time series risk numbers, this study contributes to the literature by using ex-ante risk numbers based on the actual assets and de facto portfolio data.
The second paper examines the influence of European legislation regarding risk classification of mutual funds. We conduct analyses on a set of worldwide equity indices and find that a strategy based on the long term volatility as it is imposed by the Synthetic Risk Reward Indicator (SRRI) would lead to substantial variations in exposures ranging from short phases of very high leverage to long periods of under investments that would be required to keep the risk classes. In some cases, funds will be forced to migrate to higher risk classes due to limited means to reduce volatilities after crises events. In other cases they might have to migrate to lower risk classes or increase their leverage to ridiculous amounts. Overall, we find if the SRRI creates a binding mechanism for fund managers, it will create substantial interference with the core investment strategy and may incur substantial deviations from it. Fruthermore due to the forced migrations the SRRI degenerates to a passive indicator.
The third paper examines the impact of this volatility based fund classification on portfolio performance. Using historical data on equity indices we find initially that a strategy based on long term portfolio volatility, as it is imposed by the Synthetic Risk Reward Indicator (SRRI), yields better Sharpe Ratios (SRs) and Buy and Hold Returns (BHRs) for the investment strategies matching the risk classes. Accounting for the Fama-French factors reveals no significant alphas for the vast majority of the strategies. In our simulation study where volatility was modelled through a GJR(1,1) - model we find no significant difference in mean returns, but significantly lower SRs for the volatility based strategies. These results were confirmed in robustness checks using alternative models and timeframes. Overall we present evidence which suggests that neither the higher leverage induced by the SRRI nor the potential protection in downside markets does pay off on a risk adjusted basis.

The dissertation deals with methods to improve design-based and model-assisted estimation techniques for surveys in a finite population framework. The focus is on the development of the statistical methodology as well as their implementation by means of tailor-made numerical optimization strategies. In that regard, the developed methods aim at computing statistics for several potentially conflicting variables of interest at aggregated and disaggregated levels of the population on the basis of one single survey. The work can be divided into two main research questions, which are briefly explained in the following sections.
First, an optimal multivariate allocation method is developed taking into account several stratification levels. This approach results in a multi-objective optimization problem due to the simultaneous consideration of several variables of interest. In preparation for the numerical solution, several scalarization and standardization techniques are presented, which represent the different preferences of potential users. In addition, it is shown that by solving the problem scalarized with a weighted sum for all combinations of weights, the entire Pareto frontier of the original problem can be generated. By exploiting the special structure of the problem, the scalarized problems can be efficiently solved by a semismooth Newton method. In order to apply this numerical method to other scalarization techniques as well, an alternative approach is suggested, which traces the problem back to the weighted sum case. To address regional estimation quality requirements at multiple stratification levels, the potential use of upper bounds for regional variances is integrated into the method. In addition to restrictions on regional estimates, the method enables the consideration of box-constraints for the stratum-specific sample sizes, allowing minimum and maximum stratum-specific sampling fractions to be defined.
In addition to the allocation method, a generalized calibration method is developed, which is supposed to achieve coherent and efficient estimates at different stratification levels. The developed calibration method takes into account a very large number of benchmarks at different stratification levels, which may be obtained from different sources such as registers, paradata or other surveys using different estimation techniques. In order to incorporate the heterogeneous quality and the multitude of benchmarks, a relaxation of selected benchmarks is proposed. In that regard, predefined tolerances are assigned to problematic benchmarks at low aggregation levels in order to avoid an exact fulfillment. In addition, the generalized calibration method allows the use of box-constraints for the correction weights in order to avoid an extremely high variation of the weights. Furthermore, a variance estimation by means of a rescaling bootstrap is presented.
Both developed methods are analyzed and compared with existing methods in extensive simulation studies on the basis of a realistic synthetic data set of all households in Germany. Due to the similar requirements and objectives, both methods can be successively applied to a single survey in order to combine their efficiency advantages. In addition, both methods can be solved in a time-efficient manner using very comparable optimization approaches. These are based on transformations of the optimality conditions. The dimension of the resulting system of equations is ultimately independent of the dimension of the original problem, which enables the application even for very large problem instances.

In this thesis, we consider the solution of high-dimensional optimization problems with an underlying low-rank tensor structure. Due to the exponentially increasing computational complexity in the number of dimensions—the so-called curse of dimensionality—they present a considerable computational challenge and become infeasible even for moderate problem sizes.
Multilinear algebra and tensor numerical methods have a wide range of applications in the fields of data science and scientific computing. Due to the typically large problem sizes in practical settings, efficient methods, which exploit low-rank structures, are essential. In this thesis, we consider an application each in both of these fields.
Tensor completion, or imputation of unknown values in partially known multiway data is an important problem, which appears in statistics, mathematical imaging science and data science. Under the assumption of redundancy in the underlying data, this is a well-defined problem and methods of mathematical optimization can be applied to it.
Due to the fact that tensors of fixed rank form a Riemannian submanifold of the ambient high-dimensional tensor space, Riemannian optimization is a natural framework for these problems, which is both mathematically rigorous and computationally efficient.
We present a novel Riemannian trust-region scheme, which compares favourably with the state of the art on selected application cases and outperforms known methods on some test problems.
Optimization problems governed by partial differential equations form an area of scientific computing which has applications in a variety of areas, ranging from physics to financial mathematics. Due to the inherent high dimensionality of optimization problems arising from discretized differential equations, these problems present computational challenges, especially in the case of three or more dimensions. An even more challenging class of optimization problems has operators of integral instead of differential type in the constraint. These operators are nonlocal, and therefore lead to large, dense discrete systems of equations. We present a novel solution method, based on separation of spatial dimensions and provably low-rank approximation of the nonlocal operator. Our
approach allows the solution of multidimensional problems with a complexity which is only slightly larger than linear in the univariate grid size; this improves the state of the art for a particular test problem problem by at least two orders of magnitude.

Sample surveys are a widely used and cost effective tool to gain information about a population under consideration. Nowadays, there is an increasing demand not only for information on the population level but also on the level of subpopulations. For some of these subpopulations of interest, however, very small subsample sizes might occur such that the application of traditional estimation methods is not expedient. In order to provide reliable information also for those so called small areas, small area estimation (SAE) methods combine auxiliary information and the sample data via a statistical model.
The present thesis deals, among other aspects, with the development of highly flexible and close to reality small area models. For this purpose, the penalized spline method is adequately modified which allows to determine the model parameters via the solution of an unconstrained optimization problem. Due to this optimization framework, the incorporation of shape constraints into the modeling process is achieved in terms of additional linear inequality constraints on the optimization problem. This results in small area estimators that allow for both the utilization of the penalized spline method as a highly flexible modeling technique and the incorporation of arbitrary shape constraints on the underlying P-spline function.
In order to incorporate multiple covariates, a tensor product approach is employed to extend the penalized spline method to multiple input variables. This leads to high-dimensional optimization problems for which naive solution algorithms yield an unjustifiable complexity in terms of runtime and in terms of memory requirements. By exploiting the underlying tensor nature, the present thesis provides adequate computationally efficient solution algorithms for the considered optimization problems and the related memory efficient, i.e. matrix-free, implementations. The crucial point thereby is the (repetitive) application of a matrix-free conjugated gradient method, whose runtime is drastically reduced by a matrx-free multigrid preconditioner.