Generative Adversarial Networks for Mitigating Biases in Machine Learning Systems
Adel Abusitta, Esma A\"imeur, Omar Abdel Wahab

TL;DR
This paper introduces a cGAN-based framework to generate synthetic fair data, effectively reducing biases in machine learning systems while improving their accuracy, addressing limitations of existing model-focused bias mitigation methods.
Contribution
The work presents a novel cGAN-based approach for bias mitigation that generates synthetic fair data, overcoming accuracy degradation and training time issues of prior methods.
Findings
Effective bias mitigation across multiple bias types
Enhanced prediction accuracy with synthetic data
Reduced training time for fair models
Abstract
In this paper, we propose a new framework for mitigating biases in machine learning systems. The problem of the existing mitigation approaches is that they are model-oriented in the sense that they focus on tuning the training algorithms to produce fair results, while overlooking the fact that the training data can itself be the main reason for biased outcomes. Technically speaking, two essential limitations can be found in such model-based approaches: 1) the mitigation cannot be achieved without degrading the accuracy of the machine learning models, and 2) when the data used for training are largely biased, the training time automatically increases so as to find suitable learning parameters that help produce fair results. To address these shortcomings, we propose in this work a new framework that can largely mitigate the biases and discriminations in machine learning systems while at…
| Acc. (300 HUs) | Acc. (500 HUs) | Acc. (700 HUs) | Acc. (900 HUs) | |
|---|---|---|---|---|
| The Proposed Approach | 84.9 1.14 | 85.1 1.09 | 85.3 1.92 | 85.5 1.15 |
| Pivot-based Approach | 76.1 1.11 | 76.4 1.84 | 77.1 1.23 | 77.3 1.78 |
| Baseline | 82.0 1.16 | 82.3 1.06 | 82.6 1.90 | 82.9 0.88 |
| Acc. (300 HUs) | Acc. (500 HUs) | Acc. (700 HUs) | Acc. (900 HUs) | |
|---|---|---|---|---|
| Men of color | 86.5 0.34 | 87.68 0.33 | 87.15 0.22 | 87.5 0.26 |
| Women of color | 67.8 0.14 | 67.9 0.29 | 68.3 0.36 | 68.4 2.40 |
| Caucasian men | 98.6 0.24 | 98.7 0.35 | 98.8 0.19 | 98.0 0.21 |
| Caucasian women | 90.4 0.45 | 91.3 0.38 | 91.8 2.02 | 91.7 0.44 |
| Acc. (300 HUs) | Acc. (500 HUs) | Acc. (700 HUs) | Acc. (900 HUs) | |
|---|---|---|---|---|
| Men of color | 87.9 0.38 | 88.1 0.45 | 88.1 0.84 | 88.3 0.66 |
| Women of color | 88.1 0.27 | 88.2 0.30 | 88.5 0.34 | 88.6 0.22 |
| Caucasian men | 99.2 0.31 | 99.3 0.42 | 99.5 0.28 | 99.7 0.37 |
| Caucasian women | 91.9 0.66 | 92.0 0.41 | 92.3 2.07 | 92.5 0.23 |
| Acc. (300 HUs) | Acc. (500 HUs) | Acc. (700 HUs) | Acc. (900 HUs) | |
|---|---|---|---|---|
| The Proposed Approach | 91.77 0.29 | 91.9 0.36 | 92.10 0.41 | 92.27 0.25 |
| Pivot-based Approach | 81.71 0.29 | 80.43 0.38 | 81.01 0.31 | 81.37 0.29 |
| Baseline | 85.82 0.33 | 86.39 0.24 | 86.51 0.31 | 86.40 0.37 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
Generative Adversarial Networks for Mitigating Biases in Machine Learning Systems
Adel Abusitta
University of Montreal
Montreal, Canada
&Esma Aïmeur
University of Montreal
Montreal, Canada
&Omar Abdel Wahab
Université du Québec en Outaouais
Gatineau, Canada Department of Computer Science and Operations Research, E-mail: [email protected] of Computer Science and Operations Research, E-mail: [email protected] of Computer Science and Engineering, E-mail: [email protected]
Abstract
In this paper, we propose a new framework for mitigating biases in machine learning systems. The problem of the existing mitigation approaches is that they are model-oriented in the sense that they focus on tuning the training algorithms to produce fair results, while overlooking the fact that the training data can itself be the main reason for biased outcomes. Technically speaking, two essential limitations can be found in such model-based approaches: 1) the mitigation cannot be achieved without degrading the accuracy of the machine learning models, and 2) when the data used for training are largely biased, the training time automatically increases so as to find suitable learning parameters that help produce fair results. To address these shortcomings, we propose in this work a new framework that can largely mitigate the biases and discriminations in machine learning systems while at the same time enhancing the prediction accuracy of these systems. The proposed framework is based on conditional Generative Adversarial Networks (cGANs), which are used to generate new synthetic fair data with selective properties from the original data. We also propose a framework for analyzing data biases, which is important for understanding the amount and type of data that need to be synthetically sampled and labeled for each population group. Experimental results show that the proposed solution can efficiently mitigate different types of biases, while at the same time enhancing the prediction accuracy of the underlying machine learning model.
1 Introduction
The world is facing a historical shift toward adopting Artificial Intelligence (AI) to automate the decision-making process in many sectors, including those of health, transportation and public services. This, however, has led to growing concerns about the bias and discrimination that these systems might produce, which might negatively affect citizens especially those who belong to ethnic and racial minorities. The hazard of bias becomes even more crucial when these systems are applied to critical and sensitive domains such as health care and criminal justice. In fact, biased AI systems are mainly engendered by the data used to feed the training process of the machine learning algorithms [10]. Training data can be incomplete, insufficiently diverse, biased, and/or consisting of non-representative samples that are not well (or poorly) defined before use [10], which might lead to biased results and lower accuracy [10]. Obtaining and labeling new data to compensate and overcome these problems is one possible solution to fight against biases. However, it has been shown that such a strategy is largely difficult, costly, privacy-sensitive and dangerous, especially in some critical domains like transportation and health [18] [26].
Many approaches have been recently proposed to fight against bias and discrimination in machine learning systems. The problem of the existing mitigation approaches [30] [32] is that they overlook the fact that the data used to train the machine learning algorithm might be the root cause of unfair results. In particular, these approaches focus on tuning the training algorithms to decrease the chances of producing biased results. Although such a model-based strategy might end up producing fair results, the accuracy of the underlying machine learning algorithm will be largely degraded. In other words, the mitigation will be achieved on the account of the overall prediction accuracy [11]. Besides, when the training data are largely biased, the time needed to complete the training and obtain a fair model will dramatically increase, compared to the case of traditional training algorithms. The reason is that these approaches not only try to minimize the loss function (in order to teach the machine learning model), but also work on minimizing the chances of producing unfair results. Thus, a longer training time is needed to find the suitable parameters for a fair model.
To address the above-mentioned shortcomings, we propose a new framework for mitigating biases in machine learning systems, without degrading their accuracy. The proposed framework is based on conditional Generative Adversarial Networks (cGANs) [33], special versions of the Generative Adversarial Networks (GANs) [17], which have shown unprecedented success in generating high-quality new synthetic data with selective properties. The proposed framework allows the designers of the machine learning systems to estimate the real distribution of the original data pertaining to the targeted population groups (population groups that are victims of biases) through formulating a minimax two-player game [4] [3]. The game is played between two models, which are trained simultaneously, i.e., the Discriminator () and the Generator (). is trained to capture the data distribution through trying to maximize the probability of committing a mistake. On the other hand, is trained to maximize the probability that a data sample came from a targeted population group rather than the . The training of both and is repeated over many iterations until a generative model that can generate new synthetic data pertaining to the targeted population groups is obtained. The resulting generative model is then used to synthetically produce new data, which are used to augment the training set so as to compensate and overcome the bias problem. In this way, machine learning algorithms can be trained on these data in order to produce unbiased predictions.
Unlike similar works (e.g., [39]), the proposed model gives the designers of the machine learning systems the flexibility to decide on the amount of data that needs to be synthetically sampled and labeled, taking into account their domain knowledge. The proposed framework is also designed to be integrated into another framework for analyzing and understanding data biases. The objective is to guide the machine learning model designers on the amount and type of data that needs to be synthetically sampled and labeled. This, in turn, minimizes the chances of synthetically generating unnecessary data. Our contributions are summarized as follows. First, we propose a new framework for mitigating biases in machine learning systems while at the same time enhancing their overall accuracy. Second, we integrate the proposed mitigation framework into an analytical framework for understanding data biases. This allows us to infer the type and amount of data that needs to be synthetically sampled in order to augment the training data. Finally, we propose a new framework that gives the designers of the machine learning systems the flexibility to decide on the amount of data that needs to be synthetically sampled and labeled, taking into account their domain knowledge.
2 Related Work
The idea of using adversarial training for mitigating biases in machine learning systems has recently been addressed in several works. For example, Madras et al. [32] propose a “fair” representation of data [29] that can be used by the classifier to generate fair decisions. They employ GANs to ensure that the generated representation of data is fair. Similarly, Louppe et al. [30] propose a new approach called “Pivot-based approach”. The framework also uses GANs not to generate new synthetic data but to create a new classifier that guarantees unbiased predictions. The method modifies the GANs design through changing the role of the generator from learning how to generate new synthetic data to a classifier that is used to produce fair results. During the training of GANs, the classifier is optimized and updated based on the prediction losses of the sensitive attributes (Ethnicity, Gender, etc.). The main disadvantage of this approach is that it does not care about the overall accuracy of the classifier during the bias mitigation process. It only cares about reducing the biased results in the classifier. In other words, the mitigation in this approach is achieved on the account of the overall accuracy. In contrast, our framework can reduce biases while at the same time enhancing the overall system’s accuracy.
Xu et al. [39] also adopt the GANs with the aim of generating new synthetic fair data, which are then used to train the classifier on how to produce unbiased decisions. For this purpose, another discriminator was used to check if the fairness has been achieved or not. Similar approaches have been proposed in [8], [14] [25] and [28]. These data-driven mitigation approaches suffer from three essential shortcomings. First, they propose to generate new data for each particular population group, thus leading to unnecessary data and unnecessary overhead. Second, these approaches require frequently verifying the machine learning model to check whether the generated data lead to a fair model or not. Third, these approaches are not complemented by any framework for analyzing and understanding data biases. This makes the designers of the machine learning systems unable to efficiently estimate and understand the amount and type of data that need to be synthetically sampled and labeled.
In contrast, our proposed mitigation approach is coupled with a framework for analyzing data biases. This is important to understand the amount of data that needs to be synthetically sampled for each particular population group. Moreover, the proposed framework gives the designers of machine learning systems the flexility to decide on the amount of data that should be synthetically sampled, taking into account both the domain knowledge and prediction accuracy with respect to the original data. As a result, the proposed model enables us to achieve fair machine learning systems while at the same time enhancing the accuracy of the prediction with minimum training overhead.
Celis et al. [11] formulate the adversarial problem as a multi-objective optimization model and try to find the fair model using a gradient descent-ascent algorithm with a modified gradient update step [11]. In fact, their approach is inspired by the work proposed by [41], while adding more robust theoretical foundations. Similarly, Agarwal et al. [6] propose a minimax optimization problem, which is solved using the saddle point methods [27] in order to derive the fair model. Other model-based mitigation approaches also are proposed in [15] [35] [38] [22]. These approaches propose algorithms to find suitable thresholds for trained classifiers so as to ensure equalized and fair odds. In particular, they try to fix the decision boundary in such a way to ensure that the final classifier is fair.
Most of the above-mentioned model-based mitigation approaches do not consider the training data as a potential reason for biased results. Instead, they focus only on modifying the training algorithms to produce fair results. Two main disadvantages can be distinguished in such an approach. First, the mitigation is achieved on the account of the accuracy. Second, the time needed to obtain the fair model is higher than that in traditional training algorithms, especially when the data used for training are largely biased [5] [2]. This is because these models are not only trained to minimize the loss function, but also to minimize the chances of producing unfair results.
3 The Proposed Framework for Mitigating Machine Learning Biases
In this section, we provide the details of the our framework proposed for mitigating biases in machine learning systems. We first give some explanations on Generative Adversarial Networks and conditional Generative Adversarial Networks and then present the proposed mitigation model in detail, followed by our framework for analyzing data biases.
3.1 Generative Adversarial Nets and the Conditional Version
Generative adversarial networks (or GANs) is a new generative model that has been proposed by [17]. A generative model can be seen as a way of learning any kind of data distribution using unsupervised learning techniques [7] [23]. Although several generative models have been proposed in the literature such as Deep Belief Network (DBN) [23] and Variational Autoencoder (VAE) [13], GANs have received more attention thanks to their unprecedented ability to generate new synthetic high-quality data compared to the traditional generative models. In fact, GANs consist of two models: a discriminative () and a generative () models. is trained to capture the data distribution through trying to maximize the probability of committing a mistake. On the other hand, is trained to maximize the probability that a data sample came from a targeted population group rather than the . The training of both the discriminative and generative models is repeated over many iterations until the discriminative model becomes unable to distinguish whether the underlying data is a sample from the data or generated from the generater. This framework is also known as a minimax two-player game [34] [21] [20] and is described formally as follows:
[TABLE]
Conditional Generative Adversarial Networks (or cGANs) [33] are a special case of GANs which have shown great success in generating high-quality new synthetic data with selective properties. Although Goodfellow et. al [17] have already indicated in their original work the possibility of training cGANs, their work did not provide theoretical and experimental results to support this claim. cGANs can be achieved through adding a condition as an input in both and . The formal description of cGANs is described as follows:
[TABLE]
3.2 The Proposed Model
The proposed mitigation model is based on cGANs. In particular, we train to synthetically produce new synthetic data based on the Targeted Population Groups (). represent those population groups against whom the machine learning models produce biased results. The new data generated using the proposed framework are then used to augment the training data (incomplete and biased data). The new data (original data and generated data) will then be used to train the machine learning algorithms. Figure 1 depicts the architecture of our proposed model.
In the next section, we present a new framework used for analyzing data biases and exploring the . This framework is designed to be integrated into the proposed mitigation approach in order to allow the designers of the machine learning systems to understand the amount and type of data that should be synthetically sampled for each population group. To this end, the objective function of a two-player minimax game is defined as follows:
[TABLE]
Since the standard training of GANs cannot easily converge (i.e., non-convergence problem) [16] and to avoid mode collapse [16], we adopt a Primal-Dual Sub-gradient method to solve this problem. This method is proposed by [12] and can be seen as a Lagrangian perspective of GANs [12]. To this end, we construct a convex optimization problem as follows:
[TABLE]
where is some convex set and the variables are Dis=((|),…(|)). Let = ((|), … , (|)), where (|) is the Lagrangian dual associated with the -th constraint. Therefore, the Lagrangian function becomes as follows:
[TABLE]
The proposed training algorithm (Algorithm 1), which is inspired by [12], is based on (5). In Algorithm 1, the targeted population group () is taken as an input and the goal is to train to produce data that cope with the . In the proposed algorithm, the process of updating of is similar to the standard training; however, the process of updating is different. For the , when the data distribution and generated distribution have disjoint supports [19] [12], the may not be updated using standard training (7) (8) (9). This is useful to prevent the main source of mode collapse [12]. Note that after a certain fixed period of time denoted by , the whole steps are repeated in order to enable both the and to learn how to produce new high-quality synthetic data, based on the targeted population group.
3.3 A Framework for Analyzing Data Biases
In the previous section, we proposed a new algorithm (Algorithm 1) for learning how to train the generator on how to create new synthetic data based on a given targeted population group. The algorithm takes as an input a targeted population group in order to learn how to produce new data with respect to that particular group. In this section, we present a new framework that can be used to explore the set of targeted population groups to be used as inputs for Algorithm 1. Note that this framework is inspired by the analysis presented in [36] for detecting biases in machine learning models, while adapting it to our case where we are interested in detecting biases in the data itself rather than in the machine learning model.
The following steps are used for the analysis of data biases. First, select a set of population groups to study if the classifier produces biased results against any of them. Second, train the classifier on the training data. Third, test the classifier by producing results and visualizing the prediction accuracy with respect to each population group. The visualization can be achieved either by showing the probability distribution or by displaying the accuracy obtained for each population group. Finally, analyze these results to see which population group(s) is/are victim(s) of biases.
We use the following example to illustrate how does the above-described framework practically work. Consider the adult UCI dataset [40], which is used to predict the salary of a person (below 50K). The dataset contains two Sensitive Attributes (SA), i.e., Ethnicity and Gender. This leads us to the four following population groups: African American, Caucasian, Female and Male. Although we could have combinations of these population groups (e.g., African American females), we restrict, for the sake of simplicity and without loss of generality, our example to only the above mentioned four population groups.
To determine if the training data are biased or not, we need to test whether a machine learning classifier, that is trained on these data, produces biased results or not. To this end, we trained a neural network classifier on this dataset and analyzed the prediction accuracy, taking into account above mentioned population groups. The results of our testing are given in Figure 2.
Figure 2a shows the distributions of the predicted (income 50KS_{Ethnicity}{}[0.1-0.2] is much higher compared to a “Caucasian”. Similarly, Figure 2b shows the distributions of the predicted (income 50KS_{Gender}{} is much higher compared to a “male”.
The results shown in Figure 2 give us a clear indication that the data used for training is incomplete (i.e., the number of Caucasians and males in the dataset is greater than that of African Americans and females). Therefore, we conclude that the targeted population groups that should be used as inputs to Algorithm 1 based on to the above results are: = African American and = female. Simply put, the generator will be trained to generate new African Americans and females.
4 Experimental Evaluation
This section first describes the setup used to evaluate the proposed framework. Then, the performance of the proposed bias mitigation framework is examined.
4.1 Experimental Setup
We implemented the proposed framework using Multilayer Perceptrons (MLPs) with 3 hidden layers. We used the ReLU activation function for both the generators and discriminators. The two following datasets were tested: the adult UCI dataset [40] and the Adience dataset [1], which widely used for age and gender prediction. Since the adult UCI dataset contains categorial data, we placed in parallel a dense-layer per categorical variable, followed by Gumbel-Softmax activation and a concatenation to get the final output [9] [24] [31]. Prediction performance on the validation dataset is adopted for finding the best hyper-parameter configuration. The results are reported based on a 95% confidence interval.
4.1.1 Results on the adult UCI dataset
Figure 3 shows the results obtained when applying the proposed framework on the adult UCI dataset. In particular, Figure 3a shows the progress achieved in the prediction distribution compared to Figure 2a. This progress was achieved when we augmented the original data (female) by 85% new data obtained synthetically from the generator. Figures 3b also shows the progress achieved in the prediction distribution, compared to Figure 2b, when we augmented the original data (African American) by 85% new data obtained synthetically from the generator. Note that the proposed framework is flexible in the sense that it enables machine learning designers to control the amount of data (e.g, 85%) that needs to be synthetically added for each population group. This allows the designers to consider the “Domain knowledge” during the data augmentation process.
Table 1 shows a comparison between the proposed approach and a recent work proposed in [30]. This work is called as a “Pivot-based mitigation approach” and it uses GANs not to generate new synthetic data (like we do) but to create a new classifier that guarantees fairness in predictions. The method makes a modification on the GANs through changing the role of the generator from learning how to generate new synthetic data to a classifier that is used to produce fair results. During the training process of GANs, the classifier is optimized and updated based on the prediction losses of the sensitive attributes (e.g., Ethnicity, Gender, etc.).
Table 1 shows the overall accuracy obtained by the proposed model when training the MLP on the new training data (original data + generated data) with different numbers of Hidden Units (HUs). These results are better than the results obtained using the ‘Pivot-based mitigation approach”. Our model also yields a better accuracy compared to the baseline. The baseline means that the classifier was trained on the original data without adding new synthetic data. This can be justified by the fact that the data used for training was incomplete and led to biased results, in the sense of having a lower measure of accuracy [10]. The proposed framework overcame this problem through augmenting the training data to mitigate biases and enhancing the prediction accuracy.
4.1.2 Results on the Adience dataset
Table 2 studies the accuracy of the MLP classifier with respect to a given population group. The results suggest the existence of bias against the women of color. Table 3 shows the progress achieved in the prediction accuracy compared to Table 2 when the training data was augmented with more data on women of color, which were synthetically obtained from the generator (the proposed framework).
Table 4 shows the overall prediction accuracy of the MLP classifier trained on the new training data. These results outperform both the pivot-based classifier and the baseline.
5 Limitation
Although the proposed framework has the advantage of mitigating bias in machine learning systems against targeted groups, we cannot claim that our solution fully solves the problem. In fact, bias is a broad and undefined problem, which does not always target members of minority groups (e.g., female). For example, Google conducted a recent study to determine whether the company is underpaying women or not. Surprisingly, they found that men were less paid than women even for the same job position [37]. Therefore, we argue that more efforts need to be done to generalize the proposed framework for unpredictable bias cases.
6 Conclusion and Future Work
This paper presents a new framework for the mitigation of biases in machine learning systems. The proposed framework is based on conditional generative adversarial networks, which allows us to generate new high-quality synthetic data related to the targeted population groups. The proposed framework is integrated into another analytical framework used for understanding of data biases. This allows us to understand the type and amount of data that should be synthetically sampled to augment the training data and overcome the bias problem. The training process then takes place on the new data (original data + generated data). Our model also enables the mitigation to be applied while taking into consideration the knowledge domain. Experimental results show that the proposed framework mitigates the biases against targeted population groups while at the same time enhancing the prediction accuracy of the machine learning classifiers.
As future work, we plan to design an automated mitigation process. In particular, after defining the bias, the system should automatically generate new data and perform unbiased training. The challenge here is to make the system automatically determine the exact amount of data that should be sampled, taking into account the knowledge domain.
Acknowledgment
The financial support of the Natural Sciences and Engineering Research Council of Canada is gratefully acknowledged. We also would like to acknowledge Dr. Gilles Brassard (University of Montreal), Dr. Kimiz Dalkir (McGill University), Younes Driouiche (Mila), Alexis Tremblay, Amine Belabed and Rim Ben Salem for helpful discussions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1adi [2019 (accessed April 2, 2019] The Adience data set , 2019 (accessed April 2, 2019). https://talhassner.github.io/home/projects/Adience/Adience-data.html#agegender .
- 2Abusitta et al. [2018 a] A. Abusitta, M. Bellaiche, and M. Dagenais. An svm-based framework for detecting dos attacks in virtualized clouds under changing environment. Journal of Cloud Computing , 7(1):9, 2018 a.
- 3Abusitta et al. [2018 b] A. Abusitta, M. Bellaiche, and M. Dagenais. A trust-based game theoretical model for cooperative intrusion detection in multi-cloud environments. In 2018 21st Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN) , pages 1–8. IEEE, 2018 b.
- 4Abusitta et al. [2018 c] A. Abusitta, M. Bellaiche, and M. Dagenais. On trustworthy federated clouds: A coalitional game approach. Computer Networks , 145:52–63, 2018 c.
- 5Abusitta et al. [2019] A. Abusitta, M. Bellaiche, M. Dagenais, and T. Halabi. A deep learning approach for proactive multi-cloud cooperative intrusion detection system. Future Generation Computer Systems , 2019.
- 6Agarwal et al. [2018] A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach. A reductions approach to fair classification. ar Xiv preprint ar Xiv:1803.02453 , 2018.
- 7Bengio et al. [2007] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In Advances in neural information processing systems , pages 153–160, 2007.
- 8Calmon et al. [2017] F. Calmon, D. Wei, B. Vinzamuri, K. N. Ramamurthy, and K. R. Varshney. Optimized pre-processing for discrimination prevention. In Advances in Neural Information Processing Systems , pages 3992–4001, 2017.
