Optimization for Fair Classification Methods in Heterogeneous Data
- Ensuring fairness in machine learning models is crucial for ethical and unbiased automated decision-making. Classifications from fair machine learning models should not discriminate against sensitive variables such as sexual orientation and ethnicity. However, achieving fairness is complicated by biases inherent in training data, particularly when data is collected through group sampling, like stratified or cluster sampling as often occurs in social surveys. Unlike the standard assumption of independent observations in machine learning, clustered data introduces correlations that can amplify biases, especially when cluster assignment is linked to the target variable. To address these challenges, this cumulative thesis focuses on developing methods to mitigate unfairness in machine learning models. We propose a fair mixed effects support vector machine algorithm, a Cluster-Regularized Logistic Regression and a fair Generalized Linear Mixed Model based on boosting, all of them are capable of handling both grouped data and fairness constraints simultaneously. Additionally, we introduce a Julia package, FairML.jl, which provides a comprehensive framework for addressing fairness issues. This package offers a preprocessing technique, based on resampling methods, to mitigate biases in the data, as well as a post-processing method, that seeks for a optimal cut-off selection. To improve fairness in classifications both processes can be incorporated in any classification method available in the MLJ.jl package. Furthermore, FairML.jl incorporates in-processing approaches, such as optimization-based techniques for logistic regression and support vector machine, to directly address fairness during model training in regular and mixed models. By accounting for data complexities and implementing various fairness-enhancing strategies, our work aims to contribute to the development of more equitable and reliable machine learning models.
Author: | João Vitor Pamplona |
---|---|
URN: | urn:nbn:de:hbz:385-1-24753 |
Document Type: | Doctoral Thesis |
Language: | English |
Date of completion: | 2025/04/07 |
Publishing institution: | Universität Trier |
Granting institution: | Universität Trier, Fachbereich 4 |
Date of final exam: | 2025/02/10 |
Release Date: | 2025/04/14 |
Number of pages: | V, 127 Blätter |
First page: | I |
Last page: | 127 |
Licence (German): | ![]() |