How optimal transport can tackle gender biases in multi-class neural-network classifiers for job recommendations?
Fanny Jourdan, Titon Tshiongo Kaninku, Nicholas Asher, Jean-Michel, Loubes, Laurent Risser

TL;DR
This paper introduces a model-agnostic optimal transport method to reduce gender biases in multi-class neural network classifiers for job recommendation systems, ensuring fairness and compliance with AI regulations.
Contribution
It presents a novel optimal transport strategy that mitigates gender biases in neural network classifiers, applicable across different models and datasets.
Findings
Reduces gender bias more effectively than standard methods
Applicable to textual data in job recommendation systems
Improves fairness metrics in the Bios dataset
Abstract
Automatic recommendation systems based on deep neural networks have become extremely popular during the last decade. Some of these systems can however be used for applications which are ranked as High Risk by the European Commission in the A.I. act, as for instance for online job candidate recommendation. When used in the European Union, commercial AI systems for this purpose will then be required to have to proper statistical properties with regard to potential discrimination they could engender. This motivated our contribution, where we present a novel optimal transport strategy to mitigate undesirable algorithmic biases in multi-class neural-network classification. Our stratey is model agnostic and can be used on any multi-class classification neural-network model. To anticipate the certification of recommendation systems using textual data, we then used it on the Bios dataset, for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
How optimal transport can tackle gender biases in multi-class neural-network classifiers for job recommendations?
Fanny Jourdan1,2, Titon Tshiongo Kaninku1,3, Nicholas Asher2, Jean-Michel Loubes1, Laurent Risser1
1 Institut de Mathématiques de Toulouse (UMR 5219), CNRS, Université de Toulouse, F-31062 Toulouse, France
2 Institut de Recherche en Informatique de Toulouse (UMR 5505), CNRS, Université de Toulouse, F-31062 Toulouse, France
3 AKKODIS group, France
Abstract
Automatic recommendation systems based on deep neural networks have become extremely popular during the last decade. Some of these systems can however be used for applications which are ranked as High Risk by the European Commission in the A.I. act, as for instance for online job candidate recommendation. When used in the European Union, commercial AI systems for this purpose will then be required to have to proper statistical properties with regard to potential discrimination they could engender. This motivated our contribution, where we present a novel optimal transport strategy to mitigate undesirable algorithmic biases in multi-class neural-network classification. Our stratey is model agnostic and can be used on any multi-class classification neural-network model. To anticipate the certification of recommendation systems using textual data, we then used it on the Bios dataset, for which the learning task consists in predicting the occupation of female and male individuals, based on their LinkedIn biography. Results show that it can reduce undesired algorithmic biases in this context to lower levels than a standard strategy.
K****eywords fairness; algorithmic bias; neural-networks; NLP; recommender systems; multi-class classification; certification
1 Introduction
The field of Artificial Intelligence (AI) has experienced remarkable growth over the past decade, particularly in Natural Language Processing (NLP). Current state-of-the-art NLP applications, such as translation or text-based recommendations, rely heavily on Deep Neural Networks (DNNs), which use transformer block layers [1]. The two most widely-used transformer neural network architectures for these tasks are BERT [2] and GPT [3], and there are numerous variants of these models. Compared with their predecessors, such as LSTM models [4], they exhibit significantly higher performance in NLP applications. However, due to the large number of parameters and non-linearities involved, they are even less interpretable than more classic models and are typically just treated as black-box decision systems. We developed the methodologyin this paper with the aim of controlling undesirable algorithmic biases on recommendation systems that are built on textual information from personal profiles on social networks, such as job or housing offers. Throughout this paper, we emphasize the importance of ethical considerations in the development and deployment of these applications, as they can significantly impact users’ lives.
For a long time, many believed that machine learning algorithms couldn’t be discriminatory since they lack human emotions. This view is however outdated now, as different studies have shown that an algorithm can learn and even amplify biases from a biased dataset [5]. In this paper, we use the term algorithmic biases to refer to automatic decisions made by machine learning algorithm that are not neutral, fair, or equitable for a particular subgroup of people (or statistical observations in general). This group is distinguished by a sensitive variable, such as gender, age, or ethnic origin. The field of study and prevention of these specific algorithmic biases is called Fair learning. Ensuring fairness is essential to ensure an ethical application of algorithms in society. Ethical concerns have become increasingly important in recent years, and the deployment of a discriminatory algorithm is no longer acceptable. Many regulators are already addressing ethical issues related to AI. In the area of privacy, the General Data Protection Regulation (GDPR), which was adopted by the European Parliament in 2016, allows for instance to the French Commission on Informatics and Liberty (NCIL) and other independent administrative authorities in France to impose severe penalties on companies that do not manage customer data transparently [6]. The GDPR is an example of how public authorities are progressively developing legal frameworks and taking actions to mitigate threats. More recently, the so-called AI act 111https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A52021PC0206 of the European Commission defined a list of High Risk applications of A.I., most of them being related to a strong impact on the Humans life. For instance, job candidates recommendation systems are therefore ranked as High Risk. Importantly, when sold in or from the European-Union, such A.I. systems will be need to have appropriate statistical properties with respect to any potential discrimination they may cause (see articles 9.7, 10.2, 10.3, and 71.3).
Motivated by the future certification of A.I. systems based on black-box neural-networks against discrimination, our article expands on the work of [7] to address algorithmic biases observed in NLP-based multi-class classification. The main methodological novelties of this paper are the extension of [7] to multi-class classification, and to showcase how to apply it to NLP data in an application ranked as High Risk by the AI act. The bias mitigation model proposed in this paper involves incorporating a regularization term, in addition to a standard loss, when optimizing the parameters of a neural network. This regularization term mitigates algorithmic bias by enforcing the similarity of prediction or error distributions for two groups distinguished by a predefined binary sensitive variable (e.g. males and females), measured using the 2-Wasserstein distance between the distributions. Note that [7] is the first paper that demonstrated how to calculate pseudo-gradients of this distance in a mini-batch context, enabling the use of this method to train deep neural-networks with reduced algorithmic bias.
To extend [7] to multi-class classification with deep neural-networks, we need to address a key problem: estimating the Wasserstein-2 distance between multidimensional distributions (where the dimension equals the number of output classes) requires numerous neural-network predictions, leading to slow training. In order to avoid this problem, we will redefine the regularization term to apply to predicted classes of interest, making the bias mitigation problem numerically feasible. Our secondary main contribution from an end-user perspective is demonstrating how to mitigate algorithmic bias in a text classification problem using modern transformer-based neural network architectures like RoBERTa small [8]. It is important to note that our regularization strategy is model-agnostic and could be applied to other text classification models, like those based on LSTM architectures. We evaluate our method using the Bios dataset [9], which includes over 400,000 LinkedIn biographies, each with an occupation and gender label. This dataset is commonly used to train automatic recommendation models for employers to select suitable candidates for a job and quantify algorithmic biases in the trained models. The Bios dataset is a key resource for the scientific community studying algorithmic bias in NLP.
2 Definitions and related work
Measuring algorithmic biases in machine learning
Different popular metrics exist to measure algorithmic biases in the machine learning literature. In this paper, we will use the True Positive Rate gap (TPRg) [9], which is one of the classic fairness metrics for NLP. Other metrics such as the Statistical Parity [10] or Equalized Odds [11] are also very popular. Over 20 different fairness metrics are compared in [12]. Very importantly for us, each metric shows specific algorithmic bias properties and not all of them are compatible with each other [13, 14, 15]. For instance, the True Positive Rate gap quantifies the difference between the portion of positive predictions ( using common M.L. notations) in two groups, by only considering the observations which should be classified as positive ( using common M.L. notations). Another popular metric such as the disparate impact will also quantify this difference, but for all observations. This makes their practical interpretation different.
Impact of AI biases in society
The use of artificial intelligence (AI) in decision-making systems has become increasingly widespread in recent years, and with it, concerns over the potential for discriminant bias to affect the outcomes of these decisions. We review below different key studies that have explored the impacts of such bias in AI on society. One such study [16] focused on the criminal justice system, and found that AI algorithms can produce biased outcomes, particularly when trained on non-representative data sets. This can result in higher incarceration rates for certain groups, such as racial minorities, and perpetuate systemic racism in the criminal justice system. Another study by [17] explored how gender differences and biases can affect the development and use of artificial intelligence in the field of bio-medicine and healthcare. The paper discussed the potential consequences of these differences and biases, including unequal access to healthcare and inaccurate medical diagnoses. Another important impact of algorithmic biases in society is the case of online advertisements. Ad targeting based on demographic factors such as race, gender, and age rather than interests or behaviours can perpetuate negative stereotypes and result in discrimination by limiting access to job or housing announcements for certain groups. For example, Facebook’s ad delivery algorithms can lead to biased outcomes by optimizing for maximum engagement, which can result in the amplification of certain groups or messages over others. This can lead to discrimination against certain groups, as advertisers may target their ads to specific demographics or exclude certain groups from seeing their housing and employment advertising, as highlighted by studies such as [18] and [19].
Bias mitigation in NLP
Bias in NLP systems has received a significant attention in recent years, with researchers and practitioners exploring various methods for mitigating bias in NLP models. In this subsection, we review some of the existing work on bias mitigation in NLP. The first approach to mitigate bias is to apply pre-processing techniques to the data used to train the model. Some researchers have proposed methods for removing or neutralizing sensitive attributes from the training data, such as gender or race, in order to reduce the likelihood that the model will learn to make decisions based on these attributes. We can reduce the bias directly in the text of the training dataset. For example, in the case of gender bias like the study in this paper, the most classic technique is to remove explicit gender indicators [9]. This technique is the one we will use to compare our proposed strategy to another one commonly used in industry. This technique is indeed simple to implement and makes it possible to reduce the bias, but in a partial and not very localized manner. Other classical techniques can be used, like identifying biased data in word embeddings, which represent words in a vector space. [20] demonstrated that these embeddings reflect societal biases. There are also methods to show how these embeddings can be unbiased by aligning them with a set of neutral reference vectors [21, 22]. These de-biasing methods have however strong limitations, as explained in [23], where the authors show that although the de-biasing embedding methods can reduce the visibility of gender bias, they do not completely eliminate it.
A second approach is to use post-processing de-biasing methods. These methods are model-agnostic and therefore not specific to NLP since they modify the results of previously trained classifiers in order to achieve fairer results. [11, 24] investigate this for binary classification, and [25] propose a method for multiclass classification.
The last approach to mitigate biases in AI is to use fairness-aware algorithms, which are specifically designed to take into account the potential for bias and to learn from the data in a way that reduces the risk of making biased decisions. These are the in-processing methods, which generally do not depend on the type of data input, either. The method we propose in this paper is one of them. To achieve this, we can use adversarial Learning by adjusting the discriminator. Adversarial learning involves training a model to make predictions while also training a second model to identify and correct any biases in the first model’s predictions. By incorporating this technique into the training process, [26, 27] demonstrate that it is possible to reduce the amount of bias present in machine learning models. Another technique is to constrain the predictions with a regularization technique like [28], but this technique was only used on a logistic regression classifier. On the other hand, [29] mitigate fairness specifically in neural networks. Finally, [30, 31] use fairness metrics constraints and solve the training problem subject to those constraints. All these in-processing methods apply in the case of binary classification. There is indeed an in-processing paper that proposes a method for multiclass classification for a computer vision task [32], but this paper focuses on the regularization of the mean bias amplification and therefore does not deal with the classic fairness metrics.
We want to emphasize that the pre-processing and post-processing methods do not reduce the bias to the same degree across the whole machine-learning training procedure than in-processing methods. Our paper hencefocuses on an in-processing method. In this context, our methodology tackles an issue which was still not addressed in the fair learning literature, as far as the authors know: we tackle algorithmic biases on multi-class neural-network classifiers and not on binary classifiers or on non neural-network classifiers. We believe that the potential of such a strategy is high for the future certification of commercial A.I. systems.
3 Methodology
The bias mitigation technique proposed in this paper extends the regularization strategy of [7] to multi-class classification. In this section, we first introduce our notation, then describe the regularization strategy of [7] for binary classifiers, and then extend it to multi-class classifiers. This extension is the methodological contribution of our manuscript.
3.1 General notations
Input and output observations
Let be the training observations, where and are the input and output observations, respectively. The value represents the inputs dimension or equivalently the number of input variables. It can for instance represent a number of pixels if is an image or a number of words in a text if is a word embedding. The value represents the output dimensions. In a binary classification context, i.e. if , the fact that or specifies the class of the observation . In a multi-class classification context, i.e. if , a common strategy consists in using one-hot vectors to encode the class of observation : All values , are equal to [math], except the value , which is equal to . We will use this convention all along this manuscript.
Prediction model
A classifier with parameters is trained so that the predictions it indirectly makes based on the outputs , are in average as close as possible to the true output observations in the training set. The link between the model outputs and the prediction depends on the classification context: In binary classification, is the predicted probability that , so it is common to use . Now, by using one-hot-encoded output vectors in multi-class classification, an output represents the predicted probabilities that the observation is in the different classes . As a consequence, . More interestingly for us, the predicted class is the one having the highest probability, so is a vector of size with null values everywhere, except at the index , where its value is .
Loss and empirical risk
In order to train the classification model, an empirical risk is minimized with respect to the model parameters
[TABLE]
or empirically , where the loss function represents the price paid for inaccuracy of predictions. This optimization problem is almost systematically solved by using variants of the stochastic (or mini-batch) gradient descent [33] in the machine learning literature.
Sensitive variable
An important variable in the field of fair learning is the so-called sensitive variable, which we will denote . This variable is often binary and distinguishes two groups of observations . For instance, or can indicate that the person represented in observation is either a male or a female. A widely used strategy to quantify that a prediction model is fair with respect to the variable is to compare the predictions it makes on observations in the groups and , using a pertinent fairness metric (see references of Section 2). From a mathematical point of view, this means that the difference between the distributions and , quantified by the fairness metric, should be below a given threshold. Consider for instance a binary prediction case where means that the individual has access to a bank loan, means that the bank loan is refused, and that equals [math] or refers to the fact that the individual is a male or a female. In this case, one can use the difference between the empirical probabilities of obtaining the bank loan for males and females, as a fairness metric, i.e. . More advanced metrics may also take into account the input observation , the true outputs , or the prediction model outputs instead of their binarized version .
3.2 W2reg approach for binary classification
3.2.1 Regularisation strategy
We now give an overview of the W2reg approach, described in [7], to temper algorithmic biases of binary neural-network classifiers. The goal of W2reg is to ensure that the treated binary classifier generates predictions for which the distributions in groups and do not deviate too much from pure equality. To achieve this, the similarity metric used in [7] is the Wasserstein-2 distance between the distribution of the predictions in the two groups:
[TABLE]
where is the probability distribution of the predictions made by in group , and is the inverse of the corresponding cumulative distribution function. Note that is mathematically equivalent to the histogram of the model outputs for an infinity of observations in the group , after normalization so that the histogram integral is . Remark, that this metric is also based on the model outputs and not the discrete predictions (see Section 3.1-Prediction model for the formal relation), so the probability distributions are continuous. Ensuring that this metric remains low makes it possible to control the level of fairness of the neural-network model with respect to . As specifically modelled by Eq. (2), this is made by penalizing the average squared difference between the quantiles of the predictions in the two groups. In order to train neural-network which simultaneously make accurate and fair decisions, the strategy of [7] then consists in optimizing the parameters of the model such that:
[TABLE]
where is the space of the neural-network parameters (e.g. the values of the weights, the bias terms and the convolution filters in a CNN). As usual when training a neural-network, the parameters are optimized using a gradient-descent approach, where the gradient is approximated at each gradient-descent step by using a mini-batch of observations.
3.2.2 Gradient estimation
We compute the gradient of Eq. (3) using the standard back-propagation strategy [34]. For the empirical risk part of Eq. (3), this requires to compute the derivatives of the losses with respect to the neural-network outputs , which is routinely made by solutions like PyTorch, TensorFlow or Keras, for all mainstream losses. For the Wasserstein-2 part of Eq. (3), the authors of [7] proposed to use a mathematical strategy to compute pseudo-derivatives of with respect to the neural-network outputs . Specifically, to compute the pseudo-derivative of a discrete and empirical approximation of with respect to a mini-batch output , the following equation was used:
[TABLE]
where is the number of observations in class , the are discrete versions of the cumulative distribution functions defined on a discrete grid of possible output values:
[TABLE]
where , and is the number of discretization steps. We denote and is defined such that . Finally .
3.2.3 Distinction between mini-batch observations and the observations for and
As shown in Eq. (4), computing the pseudo-derivatives of Wasserstein-2 distance with respect to model predictions requires computing the discrete cumulative distribution functions , with . Computing the would ideally require computing for all observations of the training set, which would be computationally bottleneck. To solve this issue, [7] proposed to approximate the at each mini-batch iteration, where Eq. (4) is computed, using a subset of all training observations. This observation subset is composed of randomly drawn observations in group , other randomly drawn observations in group , and the mini-batch observations. This guaranties that there are at least observations to compute either or , and that the impact of each mini-batch observation is represented in and . Note that these additional predictions do not require to backpropagate any gradient information, so their computational burden is limited in terms of memory resources. Although it is also reasonable in terms of computational resources, the amount of additional predictions should remain relatively small to avoid slowing down significantly the gradient descent. In previous experiences on images, or often appeared as reasonable, as it allowed to mitigate undesirable algorithmic biases and slowed down the whole training procedure by a factor of less than . We finally want to emphasize that preserving the amount of such additional predictions reasonable at each gradient descent step will be at the heart of our methodological contribution when extending W2reg to multi-class classification.
3.3 Extended W2reg for multi-class classification
As discussed in Section 1, our work is motivated by the need for bias mitigation strategies in NLP applications where the neural-network predicts that an input text belongs to a class among more output classes, where . We then show in this section how to take advantage of the properties of [7] to address this practical problem. We recall that the regularization strategy of [7] is model agnostic, so the fact that we treat NLP data will only be discussed in the results section. In terms of methodology, the main issue to tackle is that the model outputs are in dimension and not one dimensional, which would require to compare multi-variate point clouds following the optimal transport principles which were modelled by Eq. (2) for 1D outputs. As we will see below, this issue opens algorithmic problems to keep the computational burden reasonable and to preserve the representativity of the pertinent information. Solving them requires extending [7] with strong algorithmic constraints.
3.3.1 Reformulating the bias mitigation procedure for multi-class classification
The strategy proposed by [7] to mitigate undesired biases is to train optimal decision rules by optimizing Eq. (3), where the Wassertein-2 distance between the prediction distributions and (i.e. the distribution of the predictions for observations in groups and group ) is given by Eq. (2). As described in Section 3.1, the predictions are now a vector of dimension in a multi-class classification context (specifically ). Their distributions and are then multivariate. In this context, Eq. (2) does no hold, and another optimal transport metric such as the multivariate Wasserstein-2 Distance or the Sinkhorn Divergence should be used [35]. Note that different implementations of these metrics exist and are compatible with our problem, as e.g. those of [36, 37]. This however opens a critical issue related to the number of observations needed to reasonably penalize the differences between two multivariate point clouds, representing the observations in groups and . If the dimension of the compared data gets large, the number of observations required to reasonably compare the point clouds at each gradient descent step explodes. This problem is very similar to the well known curse of dimensionality phenomenon in machine learning, where the amount of data needed to generalize accurately the predictions grows exponentially as the number of dimensions grows.
This issue therefore leads us to think about which problem we truly need to solve when tackling undesired algorithmic bias in multi-class classification. From our application perspective, a discrimination appears when there the prediction model is significantly more accurate to predict a specific output in one of the two groups represented by . For instance, suppose that someone looks for Software Engineer jobs and that an automatic prediction model is used to recommend job candidates to an employer. For a given job candidate , the prediction model will return a set of probabilities, each of them indicating whether is recommended for the job class . Now will denote the class of jobs is looking for, i.e. Software Engineer. The prediction model will be considered as unfair if male profiles are on average clearly more often recommended by than female profiles, when an unbiased oracle would lead to equal opportunities, i.e.
[TABLE]
where the left-hand term is denoted the True Positive Rate gap (TPRg), and is a threshold above which the TPRg is considered as discriminant. As shown in Section 5, such situations can occur in automatic job profile recommendation systems using modern neural-networks. Now that the problem to tackle is clarified, we can reformulate the regularized multi-class model training procedure as follows:
- •
We first train and test a non-regularized multi-class classifier . We will denote it baseline classifier.
- •
We define a threshold under which all occupation to predict should have a TPRg (see Eq. (6)). We denote the classes for which this condition is broken, where each of these classes takes its values in .
- •
We then retrain the multi-class classifier with regularization constrains on the classes only. The regularization strategy will be developed below in Section 3.3.3.
By using this procedure, the number of observations required at each mini-batch step will be first limited to observations in the groups only, which is a first step towards an algorithmically reasonable regularized training procedure. We also believe that this also avoids to over-constrain the training procedure, which often penalizes its convergence.
3.3.2 Regularization strategy
We now push further the algorithmic simplification of the regularization procedure by focusing on the properties of the mini-batch observations. In this subsection, we suppose that is an input mini-batch observation, and recall that we want to penalize large TPRg for specific classes only. In this mini-batch step, the observations related to true output predictions for which are not concerned by the regularization, when computing the multivariate cumulative distribution function or . At each mini-batch step, it therefore appears as appealing to only consider the dimensions out of , for which at least one true output observation respects , with . This would indeed allow to further reduce the amount of additional predictions made in the mini-batch. The dimension of or would however vary at each mini-batch step, making potentially the distance estimation unstable if fully considering -dimensional distributions.
To take into account the fact that not all output dimensions should be considered at each gradient descent step, we then make a simplification hypothesis: we neglect the relations between the different dimensions when comparing the output predictions in groups and . This hypothesis is the same as the one made when using Naive Bayes classifiers [38, 39]. We believe that this hypothesis is particularly suited for one-hot-encoded outputs, as they are constructed to ideally have a single value close to 1 and all other values close to 0. We then split the multi-variate regularization strategy into a multiple one-dimensional strategy and optimize:
[TABLE]
where is the metric of Eq. (2), is the weight given to regularize the TPR gaps in class , and the are the distributions of the output predictions on dimension , i.e. the distribution of the , when the true prediction is , i.e. when . For a mini-batch observation related to an output prediction in a regularized class , the impact of a mini-batch output on the empirical approximation of can then be estimated by following the same principles as in [7]. We can then extend Eq. (4) with:
[TABLE]
where the are discrete and empirical versions of the cumulative distribution functions of the prediction outputs on dimension , i.e. the , when class should be predicted and the observations are in the group . Note too that Eq. (4) contains , which is the number of observations in class . In order to manage unbalanced output classes in the multi-class classification context, we also use a normalizing term in Eq. (8). It quantifies the number of training observations in group and class . Other notations are the same as in Eq. (4).
In a mini-batch step, suppose finally that only need to take into account the classes among the . These selected classes are those for which at least a , with and, is an observation of the mini-batch . We will then have to only sample two times predictions, for each of the selected classes, to compute the and required in Eq. (8). This makes the computational burden to regularize the neural-network training procedure reasonable, as the number of additional predictions to make only increases linearly with the number of treated classes at each mini-batch iteration. Note that no additional prediction will also be needed when a mini-batch contains no observation related to a regularized output class. This will naturally be often the case, when the number of classes gets large and/or the mini-batch size is small.
3.3.3 Proposed training procedure
The proposed strategy to train multi-class classifiers with mitigated algorithmic biases on specific classes prediction was motivated by the future need of certifying that automatic decision models are not discriminatory. In order to make absolutely clear our strategy, we detail it in algorithm 1.
4 Experimental protocol to assess W2reg on multi-class classification with NLP data
4.1 Data
We assess our methodology using the Bios [9] dataset, which contains about 400K biographies (textual data). For each biography, Bios specifies the gender (M or F) of its author as well as its occupation (among 28 occupations, categorical data). As shown in figure 1, this dataset contains heterogeneously represented occupation. Although the representation of some occupations is relatively well-balanced between males and females (e.g. professor, journalist, …), other occupations are particularly unbalanced between males and females (e.g. nurse, software engineer, …). This dataset is particularly interesting for the fair learning community, as it makes it possible to evaluate how different machine learning strategies can try to predict the true occupations of potential job candidates as accurately as possible, based on their biography, while being as fair as possible when distinguishing males and females. Note that to build this dataset, its authors used Common Crawl and identified online biographies written in English. Then, they filtered the biographies starting with a name-like pattern followed by the string “is a(n) (xxx) title,” where title is an occupation out of the BLS Standard Occupation Classification system. Having identified the twenty-eight most frequent occupations, they processed WET files from sixteen distinct crawls from 2014 to 2018, extracting online biographies corresponding to those occupations only. This resulted in about 400K biographies with labelled corresponding occupations.
4.2 Neural-network model and baseline training strategy
Our task is to predict the occupation using only the textual data of the biography. We do it by using a RoBERTa model [8], which is based on transformers architecture and is pretrained with the Masked language modelling (MLM) objective. We specifically used a RoBERTa base model pretrained by Hugging Face. All information related to how it was trained can be found in [40]. It can be remarked, that a very large training dataset was used to pretrain the model, as it was composed of five datasets: BookCorpus [41], a dataset containing 11,038 unpublished books; English Wikipedia (excluding lists, tables and headers); CC-News [42] which contains 63 millions English news articles crawled between September 2016 and February 2019; OpenWebText [43] an open-source recreation of the WebText dataset used to train GPT-2; Stories [44] a dataset containing a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas. Pre-training was performed on these data, by randomly masking 15% of the words in each of the input sentences and then trying to predict the masked words After pre-training RoBERTa parameters on this huge dataset, we then trained it on the 400.000 biographies of the Bios dataset. Training was performed with PyTorch on 2 GPUs (Nvidia Quadro RTX6000 24GB RAM) for 5 epochs with a batch size of 32 observations and a sequence length of 512 words. The optimizer was Adam with a learning rate of 1e-5, , , and . Computational times required about 36 hours for each run. We want to emphasize that 5 runs of the training procedure were performed to evaluate the stability of the accuracy and the algorithmic biases. For each of these runs, we the split dataset in 70% for training, 10% for validation and 20% for testing. We will denote as baseline models, the neural-networks trained using this procedure.
4.3 Evaluating the impact of a gender-neutral dataset
In order to evaluate the impact of a classic gender unbiasing strategy, we reproduced the baseline training protocol of Section 4.2 on two apparently unbiased versions of the Bios dataset. This classic method for debiasing consists of removing explicit gender indicators (i.e.’he’, ’she’, ’her’, ’his’, ’him’, ’hers’, ’himself’, ’herself’, ’mr’, ’mrs’, ’ms’, ’miss’ and first names). For a BERT model type, however, we could not just remove words because the model is sensitive to sentence structure, not just lexical information. We therefore adjusted the method by replacing all the first names by a neutral first name (Camille) and by choosing only one gender for all datasets (e.g., for all individuals of gender g, we did nothing; for the others, we replaced explicit gender indicators with those of g). We then created two datasets with only female or male gender indicators, and the only first name Camille.
Note that by using a fully trained model on our dataset, setting all gender indicators to either feminine or masculine should naively not change anything, since the model would only "know" one gender (which would therefore be neutral). We however used a pre-trained model on gendered datasets. It is therefore important to verify that fine-tuning this model with a male gendered dataset is equivalent to training it on a female gendered dataset. To assess this, we carried out several student tests. One between the accuracy of the trained model on the female gendered dataset and the accuracy of the male gendered one. One on the TPR gender gap for each of the professions between the two models. None of these tests had a statistically significant difference. We will then only present in Section 5 the results obtained on the model trained on the female gendered data set.
4.4 Training procedure for the regularized model
We now follow the procedure summarized in Algorithm 1 to train bias mitigated multi-class neural-network classifiers on the textual data of the Bios dataset. We first consider the 5 baseline models of Section 4.2, which were trained on the original Bios dataset (and not one of the unbiased datasets of Section 4.3). As shown in Fig. 4-(left), where the diagonal of the confusion matrices differences between males and females represents the TPR gap of all output classes, two classes have TPR gaps above or under : Surgeon (in favor of males) and Model (in favor of females). We then chose to regularize the predictions for these two occupations. Note that other occupations could have been considered (see Fig. 6) but they contained not enough statistical information to be properly treated. For instance, although the whole training set contains about 400.000 observations, it contains less than 100 female dj and less than 100 male paralegal. After having selected these two occupations, we trained 5 regularized models by minimizing Eq. (7). We chose a single parameter for the regularization (the same for both classes, but we could have taken one per class), by using cross validation, with the goal to effectively reduce the TPR gaps on regularized classes without harming the accuracy too much. The best performance/debiasing compromise we found was . An amount of additional observations was used at each mini-batch step to compute each of the discrete cumulative histograms of the regularization terms pseudo-derivatives Eq. (8). The rest of the training procedure was the same as in Section 4.2. Computational times required about 70 hours for each run.
5 Results
In commercial applications, fair prediction algorithms will be obviously much more keen to be exploited if they remain accurate. We then made sure that our regularization technique did not had a strongly negative impact on the prediction accuracy. We then quantified different accuracy metrics: First the average accuracy and then two variants of the F1 score, as it is very appropriate for a multiclass classification problem like ours. These two variants are the so-called “macro” F1-score, where we calculate the metric for each class, then we average it without taking into account the number of individuals per class; and the “weighted” F1-score where the means are weighted using the classes representativeness. We can draw similar conclusions for these three metrics, as shown in Fig. 2: our regularization method is certainly a little below the baseline in terms of accuracy, but it is more stable. In addition, it is clearly more accurate than the gender neutralizing technique of Section 4.3.
We then specifically observed the impact of our regularization strategy in terms of TPR gap on the two regularized classes Surgeon and Model. Boxplots of the TPR gap for these output classes are shown in Fig. 3. They confirm that the algorithmic bias has been reduced for these two classes. For the class Surgeon, removing gender indicators had a strong effect, but the regularization strategy further reduced the biases. For the class Model, removing gender indicators had little effect, and the regularization strategy reduced the biases by almost a factor two.
We finally wanted to make sure that reducing the unacceptable biases on these two classes would not be at the expense of newly generated biases. We then measured the difference between the average (on the 5 models) confusion matrix for females and males only. In Fig. 4, we see the evolution of our biases according to the selected method. Note first that the diagonal of these matrix differences corresponds to the TPR gaps. Remark too that we only represent the results obtained on the 16 most frequent occupations for visibility concerns, but are the complete matrices are show in the appendix. On our two regularized classes, we are getting closer to white (i.e. non-bias), and for the other classes, we also observe a decrease in bias in general, and no outlier point. For a finer analysis and more clarity, we represent in Fig. 5 the difference between the absolute values of the baseline matrix of Fig. 4 and each of the compared matrices (i.e. with neutralized genders and regularization). This clearly represents us the “gains” of these two bias reduction methods to compare them. Figure 5 confirms our intuition given in Fig. 4: in the case where the gender indicators are removed, the gain is rather slight and depends on the class. In the case of our regularization, the two regularized classes obtain a very clear positive gain, and there is no marked negative gain on the rest of the matrix.
6 Discussion
In this paper, we have defined a strategy to address the critical need for certifying that commercial prediction models present moderate discrimination biases. We specifically defined a new algorithm to mitigate undesirable algorithmic biases in multi-class neural-network classifiers, and applied it to NLP application that is ranked as High risk by E.U. regulations. Our method was shown to successfully temper algorithmic biases in this application, and outperformed a classic strategy both in terms of prediction accuracy and mitigated bias. In addition, computational times were only reasonably increased compared with a baseline training method. The state of the art of in-processing unbiasing methods mainly focuses on binary models, and our approach addresses the multiclass problem. The possibility of choosing which classes to regularize and of applying a different for each class gives a wide range of application of the method.
We finally want to emphasize that although our method was applied to NLP data, it can be easily applied to any multi-class neural-network classifier. We also believe that it could be simply adapted to other fairness metrics. Our regularization method is implemented to work as a loss in PyTorch and is compatible with PyTorch-GPU. It is freely available on GitHub222https://github.com/lrisser/W2reg – The binary classification regularizer for images and tabular data is currently distributed. The multi-class extension for NLP data, images, and tabular data will also be distributed, subject to paper acceptance.
Acknowledgments
This research was funded by the AI (Artificial Intelligence) Interdisciplinary Institute ANITI (Artificial and Natural InTelligence Institute.), which is funded by the French ‘Investing for the Future– PIA3’ program under the Grant agreement ANR-19-PI3A-0004.
Appendix A Results obtained on all output classes
The results shown in Figs. 4 and 5 selected the most largely represented output classes for readability purposes. We show in this appendix their extensions, Figs. 6 and 7, to all output classes of the Bios dataset [9]. It can be observed in these figures that other output classes than Model and Surgeon presented high gender biases, when using the baseline strategy: Paralegal, DJ and Dietician. Although we used these output classes when training the prediction model to make the classification task complex, we voluntarily decided to not regularize them for statistical concerns: These occupations are indeed first poorly represented in the Bios dataset and are additionally strongly unbalanced between males and females. Although the whole training set contains more than 400,000 biographies, there are less than 100 biographies for female DJ, male Dietician and male Paralegal. This makes their treatment with a statistically-sound strategy unreliable. When applied to statistically poorly represented observations, a constrained neural-network won’t indeed learn to use generalizable features in the input biographies, but will instead overfit the specificities of each observation which is strongly highlighted by the constraint. We can however see that the tested bias mitigation strategies on the classes Model and Surgeon did not amplify the biases on the Paralegal, DJ and Dietician classes.
From a certification perspective in the E.U., the AI act will ask to clearly mention to end-users the cases for which the predictions may be unreliable or potentially biased. In this context, our strategy makes it possible to certify that mutli-class neural-network classifiers make unbiased decisions on output classes that would be biased using standard training, if the training data offer a sufficient representativity and variability of the characteristics in these classes. In the case where a company would desire to certify that poorly represented classes in the training set are free of biases, the certification procedure will naturally require acquiring more observations.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems , 30, 2017.
- 2[2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. ar Xiv preprint ar Xiv:1810.04805 , 2018.
- 3[3] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by generative pre-training. 2018.
- 4[4] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. Advances in neural information processing systems , 27, 2014.
- 5[5] P. Besse, E. Del Barrio, P. Gordaliza, J.-M. Loubes, and L. Risser. A survey of bias in machine learning through the prism of statistical parity for the adult data set. The American Statistician , 2021.
- 6[6] Cécile De Terwangne. Titre 2-définitions clés et champ d’application du rgpd. Le règlement général sur la protection des données (RGPD/GDPR): analyse approfondie , pages 59–84, 2018.
- 7[7] Laurent Risser, Alberto Gonzalez Sanz, Quentin Vincenot, and Jean-Michel Loubes. Tackling algorithmic bias in neural-network classifiers using wasserstein-2 regularization. Journal of Mathematical Imaging and Vision , pages 1–18, 2022.
- 8[8] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. ar Xiv preprint ar Xiv:1907.11692 , 2019.
