Refine
Ensuring fairness in machine learning models is crucial for ethical and unbiased automated decision-making. Classifications from fair machine learning models should not discriminate against sensitive variables such as sexual orientation and ethnicity. However, achieving fairness is complicated by biases inherent in training
data, particularly when data is collected through group sampling, like stratified or
cluster sampling as often occurs in social surveys. Unlike the standard assumption of
independent observations in machine learning, clustered data introduces correlations that can amplify biases, especially when cluster assignment is linked to the target variable.
To address these challenges, this cumulative thesis focuses on developing methods to mitigate unfairness in machine learning models. We propose a fair mixed effects support vector machine algorithm, a Cluster-Regularized Logistic Regression and a fair Generalized Linear Mixed Model based on boosting, all of them
are capable of handling both grouped data and fairness constraints simultaneously. Additionally, we introduce a Julia package, FairML.jl, which provides a comprehensive framework for addressing fairness issues. This package offers a preprocessing technique, based on resampling methods, to mitigate biases in the data, as well as a post-processing method, that seeks for a optimal cut-off selection.
To improve fairness in classifications both processes can be incorporated in any
classification method available in the MLJ.jl package. Furthermore, FairML.jl incorporates in-processing approaches, such as optimization-based techniques for logistic regression and support vector machine, to directly address fairness during
model training in regular and mixed models.
By accounting for data complexities and implementing various fairness-enhancing
strategies, our work aims to contribute to the development of more equitable and reliable machine learning models.