Computational Techniques for Minimum Sum-of-Squares Clustering, Cardinality-Constrained Optimization, and Robust Clustering Problems

  • This thesis is concerned with two classes of optimization problems which stem mainly from statistics: clustering problems and cardinality-constrained optimization problems. We are particularly interested in the development of computational techniques to exactly or heuristically solve instances of these two classes of optimization problems. The minimum sum-of-squares clustering (MSSC) problem is widely used to find clusters within a set of data points. The problem is also known as the $k$-means problem, since the most prominent heuristic to compute a feasible point of this optimization problem is the $k$-means method. In many modern applications, however, the clustering suffers from uncertain input data due to, e.g., unstructured measurement errors. The reason for this is that the clustering result then represents a clustering of the erroneous measurements instead of retrieving the true underlying clustering structure. We address this issue by applying robust optimization techniques: we derive the strictly and $\Gamma$-robust counterparts of the MSSC problem, which are as challenging to solve as the original model. Moreover, we develop alternating direction methods to quickly compute feasible points of good quality. Our experiments reveal that the more conservative strictly robust model consistently provides better clustering solutions than the nominal and the less conservative $\Gamma$-robust models. In the context of clustering problems, however, using only a heuristic solution comes with severe disadvantages regarding the interpretation of the clustering. This motivates us to study globally optimal algorithms for the MSSC problem. We note that although some algorithms have already been proposed for this problem, it is still far from being “practically solved”. Therefore, we propose mixed-integer programming techniques, which are mainly based on geometric ideas and which can be incorporated in a branch-and-cut based algorithm tailored to the MSSC problem. Our numerical experiments show that these techniques significantly improve the solution process of a state-of-the-art MINLP solver when applied to the problem. We then turn to the study of cardinality-constrained optimization problems. We consider two famous problem instances of this class: sparse portfolio optimization and sparse regression problems. In many modern applications, it is common to consider problems with thousands of variables. Therefore, globally optimal algorithms are not always computationally viable and the study of sophisticated heuristics is very desirable. Since these problems have a discrete-continuous structure, decomposition methods are particularly well suited. We then apply a penalty alternating direction method that explores this structure and provides very good feasible points in a reasonable amount of time. Our computational study shows that our methods are competitive to state-of-the-art solvers and heuristics.

Download full text files

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Author:Carina Moreira Costa
Advisor:Martin Schmidt, Jan Pablo Burgard
Document Type:Doctoral Thesis
Date of completion:2022/05/25
Date of publication:2022/06/22
Publishing institution:Universität Trier
Granting institution:Universität Trier, Fachbereich 4
Date of final exam:2022/05/25
Release Date:2022/06/23
GND Keyword:Cluster-Analyse; Optimierung; k-Means-Algorithmus
First page:i
Last page:34
Licence (German):License LogoCC BY: Creative-Commons-Lizenz 4.0 International

$Rev: 13581 $