Filtern
Erscheinungsjahr
- 2018 (2) (entfernen)
Dokumenttyp
- Dissertation (2) (entfernen)
Sprache
- Englisch (2) (entfernen)
Schlagworte
- Optimierung (2) (entfernen)
Institut
- Fachbereich 4 (1)
- Mathematik (1)
The dissertation deals with methods to improve design-based and model-assisted estimation techniques for surveys in a finite population framework. The focus is on the development of the statistical methodology as well as their implementation by means of tailor-made numerical optimization strategies. In that regard, the developed methods aim at computing statistics for several potentially conflicting variables of interest at aggregated and disaggregated levels of the population on the basis of one single survey. The work can be divided into two main research questions, which are briefly explained in the following sections.
First, an optimal multivariate allocation method is developed taking into account several stratification levels. This approach results in a multi-objective optimization problem due to the simultaneous consideration of several variables of interest. In preparation for the numerical solution, several scalarization and standardization techniques are presented, which represent the different preferences of potential users. In addition, it is shown that by solving the problem scalarized with a weighted sum for all combinations of weights, the entire Pareto frontier of the original problem can be generated. By exploiting the special structure of the problem, the scalarized problems can be efficiently solved by a semismooth Newton method. In order to apply this numerical method to other scalarization techniques as well, an alternative approach is suggested, which traces the problem back to the weighted sum case. To address regional estimation quality requirements at multiple stratification levels, the potential use of upper bounds for regional variances is integrated into the method. In addition to restrictions on regional estimates, the method enables the consideration of box-constraints for the stratum-specific sample sizes, allowing minimum and maximum stratum-specific sampling fractions to be defined.
In addition to the allocation method, a generalized calibration method is developed, which is supposed to achieve coherent and efficient estimates at different stratification levels. The developed calibration method takes into account a very large number of benchmarks at different stratification levels, which may be obtained from different sources such as registers, paradata or other surveys using different estimation techniques. In order to incorporate the heterogeneous quality and the multitude of benchmarks, a relaxation of selected benchmarks is proposed. In that regard, predefined tolerances are assigned to problematic benchmarks at low aggregation levels in order to avoid an exact fulfillment. In addition, the generalized calibration method allows the use of box-constraints for the correction weights in order to avoid an extremely high variation of the weights. Furthermore, a variance estimation by means of a rescaling bootstrap is presented.
Both developed methods are analyzed and compared with existing methods in extensive simulation studies on the basis of a realistic synthetic data set of all households in Germany. Due to the similar requirements and objectives, both methods can be successively applied to a single survey in order to combine their efficiency advantages. In addition, both methods can be solved in a time-efficient manner using very comparable optimization approaches. These are based on transformations of the optimality conditions. The dimension of the resulting system of equations is ultimately independent of the dimension of the original problem, which enables the application even for very large problem instances.
A matrix A is called completely positive if there exists an entrywise nonnegative matrix B such that A = BB^T. These matrices can be used to obtain convex reformulations of for example nonconvex quadratic or combinatorial problems. One of the main problems with completely positive matrices is checking whether a given matrix is completely positive. This is known to be NP-hard in general. rnrnFor a given matrix completely positive matrix A, it is nontrivial to find a cp-factorization A=BB^T with nonnegative B since this factorization would provide a certificate for the matrix to be completely positive. But this factorization is not only important for the membership to the completely positive cone, it can also be used to recover the solution of the underlying quadratic or combinatorial problem. In addition, it is not a priori known how many columns are necessary to generate a cp-factorization for the given matrix. The minimal possible number of columns is called the cp-rank of A and so far it is still an open question how to derive the cp-rank for a given matrix. Some facts on completely positive matrices and the cp-rank will be given in Chapter 2. Moreover, in Chapter 6, we will see a factorization algorithm, which, for a given completely positive matrix A and a suitable starting point, computes the nonnegative factorization A=BB^T. The algorithm therefore returns a certificate for the matrix to be completely positive. As introduced in Chapter 3, the fundamental idea of the factorization algorithm is to start from an initial square factorization which is not necessarily entrywise nonnegative, and extend this factorization to a matrix for which the number of columns is greater than or equal to the cp-rank of A. Then it is the goal to transform this generated factorization into a cp-factorization. This problem can be formulated as a nonconvex feasibility problem, as shown in Section 4.1, and solved by a method which is based on alternating projections, as proven in Chapter 6. On the topic of alternating projections, a survey will be given in Chapter 5. Here we will see how to apply this technique to several types of sets like subspaces, convex sets, manifolds and semialgebraic sets. Furthermore, we will see some known facts on the convergence rate for alternating projections between these types of sets. Considering more than two sets yields the so called cyclic projections approach. Here some known facts for subspaces and convex sets will be shown. Moreover, we will see a new convergence result on cyclic projections among a sequence of manifolds in Section 5.4. In the context of cp-factorizations, a local convergence result for the introduced algorithm will be given. This result is based on the known convergence for alternating projections between semialgebraic sets. To obtain cp-facrorizations with this first method, it is necessary to solve a second order cone problem in every projection step, which is very costly. Therefore, in Section 6.2, we will see an additional heuristic extension, which improves the numerical performance of the algorithm. Extensive numerical tests in Chapter 7 will show that the factorization method is very fast in most instances. In addition, we will see how to derive a certificate for the matrix to be an element of the interior of the completely positive cone. As a further application, this method can be extended to find a symmetric nonnegative matrix factorization, where we consider an additional low-rank constraint. Here again, the method to derive factorizations for completely positive matrices can be used, albeit with some further adjustments, introduced in Section 8.1. Moreover, we will see that even for the general case of deriving a nonnegative matrix factorization for a given rectangular matrix A, the key aspects of the completely positive factorization approach can be used. To this end, it becomes necessary to extend the idea of finding a completely positive factorization such that it can be used for rectangular matrices. This yields an applicable algorithm for nonnegative matrix factorization in Section 8.2. Numerical results for this approach will suggest that the presented algorithms and techniques to obtain completely positive matrix factorizations can be extended to general nonnegative factorization problems.