Adversarial training approach for local data debiasing

Ulrich A\"ivodji; Fran\c{c}ois Bidet; S\'ebastien Gambs; Rosin Claude; Ngueveu; Alain Tapp

arXiv:1906.07858·cs.LG·September 2, 2022

Adversarial training approach for local data debiasing

Ulrich A\"ivodji, Fran\c{c}ois Bidet, S\'ebastien Gambs, Rosin Claude, Ngueveu, Alain Tapp

PDF

Open Access

TL;DR

This paper introduces GANsan, a novel adversarial training method that removes sensitive attributes from data to prevent discrimination, while maintaining data interpretability and utility, demonstrated through real dataset experiments.

Contribution

The paper presents GANsan, a new local data debiasing approach using generative adversarial networks that preserves data interpretability and can be applied locally before data release.

Findings

01

Effective removal of sensitive attributes demonstrated

02

Trade-off between fairness and data utility shown

03

Applicable to real-world datasets with promising results

Abstract

The widespread use of automated decision processes in many areas of our society raises serious ethical issues concerning the fairness of the process and the possible resulting discriminations. In this work, we propose a novel approach called GANsan whose objective is to prevent the possibility of any discrimination i.e., direct and indirect) based on a sensitive attribute by removing the attribute itself as well as the existing correlations with the remaining attributes. Our sanitization algorithm GANsan is partially inspired by the powerful framework of generative adversarial networks (in particular the Cycle-GANs), which offers a flexible way to learn a distribution empirically or to translate between two different distributions. In contrast to prior work, one of the strengths of our approach is that the sanitization is performed in the same space as the original data by only…

Tables12

Table 1. Table 1 . Distribution of the different groups with respect to the protected attribute and the decision one for both the Adult Census Income and the German Credit datasets.

Dataset	Adult Census		German Credit
Group	Protected ( $S_{x} = S_{0}$ , Female)	Default ( $S_{x} = S_{1}$ , Male)	Protected ( $S_{x} = S_{0}$ , Young)	Default ( $S_{x} = S_{1}$ , Old)
$P r (S = S_{x})$	$36.21 %$	$63.79 %$	$19 %$	$81 %$
$P r (Y = 1 \| S = S_{x})$	$11.35 %$	$31.24 %$	$57.89 %$	$72.83 %$
$P r (Y = 1)$	$24.78 %$		$70 %$

Table 2. Table 2 . Scenarios envisioned for the evaluation of GANSan . Each set is composed of either the original attributes or their sanitized versions, coupled with either the original or sanitized decision.

Scenario	Train set composition		Test set composition
Scenario	A	Y	A	Y
Baseline	Original	Original	Original	Original
Scenario 1	Sanitized	Sanitized	Sanitized	Sanitized
Scenario 2	Sanitized	Original	Sanitized	Original
Scenario 3	Sanitized	Sanitized	Original	Original
Scenario 4	Original	Original	Sanitized	Original

Table 3. Table 3 . Comparison with other works on the basis of accuracy and demographic parity on Adult.

Authors	yAcc	DemoParity
LFR (Zemel et al., 2013)	$0.78$	$\approx 0.02$
ALFR (Edwards and Storkey, 2015)	$0.825$	$\approx 0.02$
MUBAL (Zhang et al., 2018)	$0.82$	0.01
LATR (Madras et al., 2018)	0.84	$0.10$
GANSan (S2) - MLP, $α = 0.9875$	0.91 $\pm$ 0.01	$0.050 \pm 0.02$
GANSan (S2) - SVM, $α = 0.9875$	0.85 $\pm$ 0.04	$0.048 \pm 0.02$

Table 4. Table 4 . Hyper parameters tuning for Adult dataset.

	Sanitizer	Discriminator
Layers	3x Linear	5x Linear
Learning Rate (LR)	$2 e - 4$	$2 e - 4$
Hidden Activation	ReLU	ReLU
Output Activation	LeakyReLU	LeakyReLU
Losses	VectorLoss	MSE
Training rates	1	50
Batch size	$100$	$100$
Optimizers	Adam	Adam

Table 5. Table 5 . Equalized odds and demographic parity on Adult.

Clfs.	$E q O d d G a p_{1}$
Clfs.	Baseline	S1	S2	S3	S4
GB	$0.0830 \pm 0.0374$	$0.0286 \pm 0.0253$	$0.1466 \pm 0.0647$	$0.0966 \pm 0.1044$	$0.1509 \pm 0.0578$
SVM	$0.1809 \pm 0.0323$	$0.0195 \pm 0.0198$	$0.1249 \pm 0.0668$	$0.1208 \pm 0.0754$	$0.0854 \pm 0.0525$
MLP	$0.0782 \pm 0.0356$	$0.0266 \pm 0.0176$	$0.1473 \pm 0.0664$	$0.0487 \pm 0.0383$	$0.1165 \pm 0.0680$

Table 6. Table 6 . Evaluation of GANSan ’s sensitive attribute protection on Adult.

Clfs.	BER		sAcc
Clfs.	Baseline	Sanitized	Baseline	Sanitized
GB	$0.1637 \pm 0.0094$	$0.4803 \pm 0.0173$	$0.8530 \pm 0.0074$	$0.6841 \pm 0.0105$
MLP	$0.1818 \pm 0.0096$	$0.4756 \pm 0.0224$	$0.8423 \pm 0.0034$	$0.6803 \pm 0.0105$
SVM	$0.1431 \pm 0.0047$	$0.4654 \pm 0.0115$	$0.8255 \pm 0.0052$	$0.5494 \pm 0.0386$

Table 7. Table 7 . Evaluation of GANSan ’s utility on adult Census.

Clfs.	yAcc
Clfs.	Baseline	S1	S2	S3	S4
GB	$0.8631 \pm 0.0039$	$0.9650 \pm 0.0129$	$0.9119 \pm 0.0116$	$0.7244 \pm 0.0380$	$0.8313 \pm 0.0397$
SVM	$0.7758 \pm 0.0061$	$0.8895 \pm 0.0502$	$0.8489 \pm 0.0476$	$0.7368 \pm 0.0249$	$0.6605 \pm 0.0649$
MLP	$0.8384 \pm 0.0030$	$0.9685 \pm 0.0107$	$0.9143 \pm 0.0136$	$0.6008 \pm 0.0464$	$0.7724 \pm 0.0638$

Table 8. Table 8 . Most damaged profiles for α = 0.9875 𝛼 0.9875 \alpha=0.9875 on the first and the fourth folds. Only the perturbed attributes are shown.

Attrs	Original	Fold 1	Original	Fold 4
age	42	49.58	29	49.01
workclass	State	Federal	Self-emp-not-inc	Without-pay
fnlwgt	218948	192102.77	341672	357523.5
education	Doctorate	Bachelors	HS-grad	Doctorate
education-num	16	9.393	9	7.674
marital-status	Divorced	Married-civ-spouse	Married-spouse-absent	Married-civ-spouse
occupation	Prof-specialty	Adm-Clerical	Transport-moving	Protective-serv
relationship	Unmarried	Husband	Other-relative	Husband
race	Black	White	Asian-Pac-Islander	Black
hours-per-week	36	47.04	50	40.37
native-country	Jamaica	Peru	India	Thailand
damage value	$-$	$3.7706$	India	Thailand

Table 9. Table 9 . Minimally damaged profile, profile with damage at 50 % percent 50 50\% of the max at α = 0.9875 𝛼 0.9875 \alpha=0.9875 for the first fold.

Attrs	Original	$D a m a g e = 0.0291$	Original	$D a m a g e = 1.8845$
age	49	49.4	35	29.768
workclass	Federal-gov	Federal-gov	Private	Private
fnlwgt	157569	193388	241998	179164
education	HS-grad	HS-grad	HS-grad	HS-grad
education-num	9	9.102	9	8.2765
marital-status	Married-civ-spouse	Married-civ-spouse	Never-married	Never-married
occupation	Adm-Clerical	Adm-Clerical	Sales	Farming-fishing
relationship	Husband	Husband	Not-in-Family	Not-in-Family
race	White	White	White	White
capital-gain	0	0	8.474	0
capital-loss	0	0	0	0
hours-per-week	46	44.67	40	42.434
native-country	United-States	United-States	United-States	United-States
income	0	0	1	0

Table 10. Table 10 . Distribution of the different groups with respect to the sensitive attribute and the decision one on German credit.

Dataset	German Credit
Group	Sensitive ( $S_{x} = S_{0}$ , Young)	Default ( $S_{x} = S_{1}$ , Old)
$P r (S = S_{x})$	$19 %$	$81 %$
$P r (Y = 1 \| S = S_{x})$	$57.89 %$	$72.83 %$
$P r (Y = 1)$	$70 %$

Table 11. Table 11 . Evaluation of GANSan ’s protection on test. Values for reference points A, B and C

Classifier	Original	A: $α = 0.6$	B, C: $α = 0.9968$
GB	$0.3652 \pm 0.0402$	$0.4160 \pm 0.0590$	$0.4549 \pm 0.0411$
MLP	$0.3723 \pm 0.0352$	$0.3981 \pm 0.0537$	$0.4428 \pm 0.0547$
SVM	$0.2521 \pm 0.0434$	$0.2868 \pm 0.0760$	$0.3243 \pm 0.0469$

Table 12. Table 12 . GANSan quantitative results on german credit dataset

$α$	Classifier	$E q O d d G a p_{1}$
$α$	Classifier	Baseline	S1	S2	S3	S4
$0.6$	GB	$0.0681 \pm 0.0977$	$0.0394 \pm 0.1271$	$0.0345 \pm 0.1209$	$0.0283 \pm 0.1209$	$0.0258 \pm 0.0784$
	MLP	$0.0910 \pm 0.1323$	$0.0316 \pm 0.1208$	$0.0207 \pm 0.1190$	$0.0952 \pm 0.1448$	$0.1249 \pm 0.1666$
	SVM	$0.1415 \pm 0.1391$	$0.1421 \pm 0.1488$	$0.1207 \pm 0.1558$	$0.1133 \pm 0.1103$	$0.1556 \pm 0.1175$
$0.99688$	GB	$0.0681 \pm 0.0977$	$0.0904 \pm 0.0960$	$0.0871 \pm 0.0877$	$0.0131 \pm 0.1875$	$0.0607 \pm 0.0781$
	MLP	$0.0910 \pm 0.1323$	$0.0598 \pm 0.0968$	$0.0898 \pm 0.0795$	$0.1404 \pm 0.2049$	$0.1425 \pm 0.0898$
	SVM	$0.1415 \pm 0.1391$	$0.1184 \pm 0.1593$	$0.1012 \pm 0.1465$	$0.1520 \pm 0.1500$	$0.1625 \pm 0.1186$

Equations13

D e m o P a r i t y = ∣ P (Y ∣ S = 0) - P (Y ∣ S = 1) ∣,

D e m o P a r i t y = ∣ P (Y ∣ S = 0) - P (Y ∣ S = 1) ∣,

E q O dd G a p_{y} = ∣ P r (\hat{Y} = 1∣ S = 0, Y = y) - P r (\hat{Y} = 1∣ S = 1, Y = y) ∣

E q O dd G a p_{y} = ∣ P r (\hat{Y} = 1∣ S = 0, Y = y) - P r (\hat{Y} = 1∣ S = 1, Y = y) ∣

B E R (A d v (A, Y), s) = \frac{1}{2} (s = 0 \sum 1 P (A d v (A, Y) \neq = s ∣ S = s)) .

B E R (A d v (A, Y), s) = \frac{1}{2} (s = 0 \sum 1 P (A d v (A, Y) \neq = s ∣ S = s)) .

J^{S_{an}} (D, S_{an}, D_{i sc}) = α * d_{s} (S, \overset{ˉ}{S}) + (1 - α) * (d_{r} (D, S_{an} (D)))

J^{S_{an}} (D, S_{an}, D_{i sc}) = α * d_{s} (S, \overset{ˉ}{S}) + (1 - α) * (d_{r} (D, S_{an} (D)))

d i v er s i t y = \frac{\sum _{i = 1}^{N} \sum _{j = 1}^{N} \sum _{k = 1}^{d} ( r ^ _{i, k} - r ^ _{j, k} ) ^{2}}{N \times ( N - 1 ) \times d} .

d i v er s i t y = \frac{\sum _{i = 1}^{N} \sum _{j = 1}^{N} \sum _{k = 1}^{d} ( r ^ _{i, k} - r ^ _{j, k} ) ^{2}}{N \times ( N - 1 ) \times d} .

R C

R C

f (or i g ina l, s ani t i z e d)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning

MethodsInterpretability

Full text

Adversarial training approach for local data debiasing

Ulrich Aïvodji

[email protected]

Université du Québec à Montréal405 Sainte-Catherine Street EastMontréalQuébecCanadaH2L 2C4

,

François Bidet

[email protected]

Ecole PolytechniqueRoute de SaclayPalaiseau CedexFrance91128

,

Sébastien Gambs

[email protected]

Université du Québec à Montréal405 Sainte-Catherine Street EastMontréalQuébecCanadaH2L 2C4

,

Rosin Claude Ngueveu

ngueveu.rosin˙[email protected]

Université du Québec à Montréal405 Sainte-Catherine Street EastMontréalQuébecCanadaH2L 2C4

and

Alain Tapp

[email protected]

Université de Montréalsuccursale Centre-villeMontréalQuébecCanadaH3C 3J7

Abstract.

The widespread use of automated decision processes in many areas of our society raises serious ethical issues with respect to the fairness of the process and the possible resulting discriminations. To solve this issue, we propose a novel adversarial training approach called GANSan for learning a sanitizer whose objective is to prevent the possibility of any discrimination (i.e., direct and indirect) based on a sensitive attribute by removing the attribute itself as well as the existing correlations with the remaining attributes. Our method GANSan is partially inspired by the powerful framework of generative adversarial networks (in particular Cycle-GANs), which offers a flexible way to learn a distribution empirically or to translate between two different distributions. In contrast to prior work, one of the strengths of our approach is that the sanitization is performed in the same space as the original data by only modifying the other attributes as little as possible, thus preserving the interpretability of the sanitized data. Consequently, once the sanitizer is trained, it can be applied to new data locally by an individual on his profile before releasing it. Finally, experiments on a real datasets demonstrate the effectiveness of the approach as well as the achievable trade-off between fairness and utility.

††copyright: none

1. Introduction

In recent years, the availability and the diversity of large-scale datasets, the algorithmic advancements in machine learning and the increase in computational power have led to the development of personalized services and prediction systems to such an extent that their use is now ubiquitous in our society. For instance, machine learning-based systems are now used in banking for assessing the risk associated with loan applications (Mahmoud et al., 2008), in hiring system (Faliagka et al., 2012) and in predictive justice to quantify the recidivism risk of an inmate (Center, 2016). Despite their usefulness, the predictions performed by these algorithms are not exempt from biases, and numerous cases of discriminatory decisions have been reported over the last years.

For example, going back on the case of predictive justice, a study conducted by ProPublica showed that the recidivism prediction tool COMPAS, which is currently used in Broward County (Florida), is strongly biased against black defendants, by displaying a false positive rate twice as high for black persons than for white persons (Julia Angwin and Kirchner, 2016). If the dataset exhibits strong detectable biases towards a particular sensitive group (e.g., an ethnic or minority group), the naïve solution of removing the attribute identifying the sensitive group prevents only direct discrimination. Indeed, indirect discrimination can still occur due to correlations between the sensitive attribute and other attributes.

In this paper, we propose a novel approach called GANSan (for Generative Adversarial Network Sanitizer) to address the problem of discrimination due to the biased underlying data.

In a nutshell, our approach learns a sanitizer (in our case a neural network) transforming the input data in a way that maximize the following two metrics : (1) fidelity, in the sense that the transformation should modify the data as little as possible, and (2) non-discrimination, which means that the sensitive attribute should be difficult to predict from the sanitized data.

A typical use case might be one in which a company during its recruitment process offers to job applicants a tool to remove racial correlation in their data before submitting their sanitized profile on the job application platform. If built appropriately, this tool would make the recruitment process of the company free from racial discrimination as it never had access to the original profile.

Overall, our contributions can be summarized as follows.

•

We propose a novel adversarial approach, inspired from Generative Adversarial Networks (GANs) (Goodfellow et al., 2014), in which a sanitizer is learned from data representing the population. The sanitizer can then be applied on a profile in such way that the sensitive attribute is removed, as well as existing correlations with other attributes while ensuring that the sanitized profile is modified as little as possible, preventing both direct and indirect discrimination.

•

Our objective is more generic than simply building a non-discriminating classifier, in the sense that we aim at debiasing the data with respect to the sensitive attribute. Thus, one of the main benefits of our approach is that the sanitization can be performed without having any knowledge regarding the tasks that are going to be conducted in the future on the sanitized data.

•

Another strength of our approach is that once the sanitizer has been learned, it can be used locally by an individual (e.g., on a device under his control) to generate a modified version of his profile that still lives in the same representation space, but from which it is very difficult to infer the sensitive attribute. In this sense, our method can be considered to fall under the category of randomized response techniques (Warner, 1965) as it can be distributed before being used locally by a user to sanitize his data. Thus, it does not require his true profile to be sent to a trusted third party. Of all of the approaches that currently exist in the literature to reach algorithmic fairness (Friedler et al., 2018), we are not aware of any other work that has considered the local sanitization with the exception of (Romanelli et al., 2019), which focuses on the protection of privacy but could also be applied to enhance fairness.

•

To demonstrate its usefulness, we have proposed and discussed four different evaluation scenarios and assessed our approach on real datasets for these four different scenarios. In particular, we have analyzed the achievable trade-off between fairness and utility measured both in terms of the perturbations introduced by the sanitization framework but also with respect to the accuracy of a classifier learned on the sanitized data.

The outline of the paper is as follows. First, in Section 2, we introduce the system model before reviewing the background notions on fairness metrics. Afterwards, in Section 3, we review the related work on methods for enhancing fairness belonging to the preprocessing approach like ours before describing GANSan in Section 4. Finally, we evaluate experimentally our approach in Section 5 before concluding in Section 6.

2. Preliminaries

In this section, we first present the system model used in this paper before reviewing the background notions on fairness metrics.

2.1. System model

In this paper, we consider the generic setting of a dataset $D$ composed of $N$ records. Each record $\textit{r}_{i}$ typically corresponds to the profile of the individual $i$ and is made of $d$ attributes, which can be categorical, discrete or continuous. Amongst those, the sensitive attribute S (e.g., gender, ethnic origin, religious belief, …) should remain hidden to prevent discrimination. In addition, the decision attribute $Y$ is typically used for a classification task (e.g., accept or reject an individual for a job interview). The other attributes of the profile, which are neither S nor $Y{}$ , will be referred hereafter as A.

For simplicity, in this work we restrict ourselves to the situations in which these two attributes are binary (i.e., $\textit{S}{}\in\{0,1\}$ and $Y\in\{0,1\}$ ). However, our approach can also be generalized to multivalued attributes, although quantifying fairness for multivalued attributes is much more challenging than for binary ones (Kearns et al., 2017). Our main objective is to prevent the possibility of inferring the sensitive attribute from the sanitized data.

This objective is similar to the protection against group membership inference, which in our context amounts to distinguish between the two groups generated by the values of S, which we will refer to as the sensitive group (for which $\textit{S}=0$ ) and the default group (for which $\textit{S}=1$ ).

2.2. Fairness metrics

First, we would like to point out that there are many different definitions of fairness existing in the literature (Dwork et al., 2012; Joseph et al., 2016; Corbett-Davies et al., 2017; Arvind, 2018; Friedler et al., 2018; Verma and Rubin, 2018) and that the choice of the appropriate definition is highly dependent of the context considered.

For instance, one natural approach for defining fairness is the concept of individual fairness (Dwork et al., 2012), which states that individuals that are similar except for the sensitive attribute should be treated similarly (i.e, receive similar decisions). This notion relates to the legal concept of disparate treatment (Barocas and Selbst, 2016), which occurs if the decision process was made based on sensitive attributes. This definition is relevant when discrimination is caused by the decision process. Therefore, it cannot be used in the situation in which the objective is to directly redress biases in the data.

In contrast to individual fairness, group fairness relies on statistic of outcomes of the subgroups indexed by S and can be quantified in several ways, such as demographic parity (Berk et al., [n.d.]) and equalized odds (Hardt et al., 2016). More precisely, the demographic parity corresponds to the absolute difference of rates of positive outcomes in the sensitive and default groups (for which respectively $\textit{S}{}=0$ and $\textit{S}{}=1$ ):

[TABLE]

while equalized odds is the absolute difference of odds in each subgroup:

[TABLE]

Compared to demographic parity, equalized odds is more suitable when the base rates in both groups differ ( $P(Y{}=1|\textit{S}{}=0)\neq P(Y{}=1|\textit{S}{}=1)$ ). Note that these definitions are agnostic to the cause of the discrimination and are based solely on the assumption that statistics of outcomes should be similar between subgroups.

In our work, we follow a different line of research by defining fairness in terms of the inability to infer S from other attributes (Feldman et al., 2015; Xu et al., 2018). This approach stems from the observation that it is impossible to discriminate based on the sensitive attribute if the latter is unknown and cannot be predicted from other attributes. Thus, our approach aims at sanitizing the data in such a way that no classifier should be able to infer the sensitive attribute from the sanitized data.

The inability to infer the attribute S can be measured by the accuracy of a predictor Adv trained to recover the hidden S (sAcc), as well as the balanced error rate (BER) introduced in (Feldman et al., 2015):

[TABLE]

The BER captures the predictability of both classes and a value of $\dfrac{1}{2}$ can be considered optimal for protecting against inference in the sense that it means that the inferences made by the predictor are not better than a random guess. In addition, the BER is more relevant than the accuracy of a classifier $sAcc{}$ at predicting the sensitive attribute for datasets with imbalanced proportions of sensitive and default groups. Thus, a successful sanitization would lead to a significant drop of the accuracy while raising the BER close to its optimal value of $0.5$ .

3. Related work

In recent years, many approaches have been developed to enhance the fairness of machine learning algorithms. Most of these techniques can be classified into three families of approaches, namely (1) the preprocessing approach (Edwards and Storkey, 2015; Feldman et al., 2015; Louizos et al., 2015; Zemel et al., 2013) in which fairness is achieved by changing the characteristics of the input data (e.g. by suppressing undesired correlations with the sensitive attribute), (2) the algorithmic modification approach (also sometimes called constrained optimization) in which the learning algorithm is adapted to ensure that it is fair by design (Zafar et al., 2017; Kamishima et al., 2012) and (3) the postprocessing approach that modifies the output of the learning algorithm to increase the level of fairness (Kamiran et al., 2010; Hardt et al., 2016)111We refer the interested reader to (Friedler et al., 2018) for a recent survey comparing the different fairness-enhancing methods.. Due to the limited space and as our approach falls within the preprocessing approach (Edwards and Storkey, 2015; Feldman et al., 2015; Louizos et al., 2015; Zemel et al., 2013) in which fairness is achieved by changing the characteristics of the input data (e.g. by suppressing undesired correlations with the sensitive attribute), we will review afterwards only methods of this category that makes use of an adversarial training.

Several approaches have been explored to enhance fairness based on adversarial learning. For instance, Edwards and Storkey (Edwards and Storkey, 2015) have trained an encoder to output a representation from which an adversary is unable to predict the group membership, from which a decoder can reconstruct the data and on which decision predictor still performs well. Madras, Creager, Pitassi and Zemel (Madras et al., 2018) extended this framework to satisfy the equality of opportunities constraint (Hardt et al., 2016) and explored the theoretical guarantees for fairness provided by the learned representation as well as the ability of the representation to be used for different classification tasks. Beutel, Chen, Zhao and Chi (Beutel et al., 2017) have studied the impact of data quality on fairness in the context of adversarial learning, and demonstrated for instance that learning a representation independent of the sensitive attribute with a balanced dataset ensures statistical parity. Zhang, Lemoine and Mitchell (Zhang et al., 2018) have designed a decision predictor satisfying group fairness by ensuring that an adversary is unable to infer the sensitive attribute from the predicted outcome. McNamara, Ong and Williamson (McNamara et al., 2019) have investigated the benefits and drawbacks of fair representation learning, demonstrating that techniques building fair representations restrict the space of possible decisions, limiting the usages of resulting data while providing fairness.

All these previous approaches does not preserve the interpretability of the data, in the sense that the modified profile lives in a different space than the original one. One notable exception is FairGan (Xu et al., 2018), which maintains the interpretability of the profile. Their objective is to learn a fair classifier on a dataset that has been generated such that it is discrimination-free and whose distribution on attributes is close to the original one. While FairGan generates a synthetic dataset close to the original data while being discrimination free, one key difference with GANSan is that FairGan cannot be used to sanitize directly a particular profile. Following a similar line of work, there is a growing body of research investigating the use of adversarial training to protect the privacy of individuals during the collection or disclosure of data. Feutry, Piantanida, Bengio and Duhamel (Feutry et al., 2018) have proposed an anonymization procedure based on the learning of an encoder, an adversary and a label predictor. The authors have ensured the convergence of these three networks during training by proposing an efficient optimization procedure with bounds on the probability of misclassification. Pittaluga, Koppal and Chakrabarti (Pittaluga et al., 2019) have designed a procedure based on adversarial training to hide a private attribute of a dataset. Romanelli, Palamidessi and Chatzikokolakis (Romanelli et al., 2019) have designed a mechanism to create a dataset preserving the original representation. They have developed a method for learning an optimal privacy protection mechanism also inspired by GAN (Tripathy et al., 2017), which they have applied to location privacy. The objective is to minimize the amount of information (measured by the mutual information) preserved between S and the prediction made on the decision attribute by a classifier while respecting a bound on utility. With respect to the local sanitization and randomized response techniques, most of them are applied in the context of privacy protection (Wang et al., 2016). Our approach is among the first that places the protection of information at the individual level as the user can locally sanitize his data before publishing it.

4. Adversarial training for data debiasing

As previously explained, removing the sensitive attribute is rarely sufficient to guarantee non-discrimination as correlations are likely to exist between other attributes and the sensitive one.

In general, detecting and suppressing complex correlations between attributes is a difficult task.

To address this challenge, our approach GANSan relies on the modelling power of GANs to build a sanitizer that can cancel out correlations with the sensitive attribute without requiring an explicit model of those correlations. In particular, it exploits the capacity of the discriminator to distinguish the subgroups indexed by the sensitive attribute. Once the sanitizer has been trained, any individual can apply it locally on his profile before disclosing it. The sanitized data can then be safely used for any subsequent task.

4.1. Generative adversarial network sanitization

High level overview.

Formally, given a dataset $D$ , the objective of GANSan is to learn a function $S_{an}$ , called the sanitizer that perturbs individual profiles of the dataset $D$ , such that a distance measure called the fidelity $fid$ (in our case we will use the $L_{2}$ norm) between the original and the sanitized datasets ( $\bar{D}=S_{an}(D)=\{\bar{\textit{A}{}},\bar{Y{}}\}$ ), is minimal, while ensuring that S cannot be recovered from $\bar{D}$ . Our approach differs from classical conditional GAN (Mirza and Osindero, 2014) by the fact that the objective of our discriminator is to reconstruct the hidden sensitive attribute from the generator output, whereas the discriminator in classical conditional GAN has to discriminate between the generator output and samples from the true distribution.

Figure 1 presents the high-level overview of the training procedure, while Algorithm 1 describes it in details.

The first step corresponds to the training of the sanitizer $S_{an}$ (Algorithm 1, Lines $7-17$ ). The sanitizer can be seen as the generator similarly to standard GAN but with a different purpose. In a nutshell, it learns the empirical distribution of the sensitive attribute and generate a new distribution that concurrently respects two objectives: (1) finding a perturbation that will fool the discriminator in predicting S while (2) minimizing the damage introduced by the sanitization. More precisely, the sanitizer takes as input the original dataset $D$ (including S and Y) plus some noise $P_{z}$ . The noise introduced is used to prevent the over-specialization of the sanitizer on the training set while making the reverse mapping of sanitized profiles to their original versions more difficult as the mapping will be probabilistic and not deterministic. As a result, even if the sanitizer is applied twice on the same profile, it can produced two different modified profiles.

The second step consists in training the discriminator $D_{isc}$ for predicting the sensitive attribute from the data produced by the sanitizer $S_{an}$ (Algorithm 1, Lines $18-24$ ). The rationale of our approach is that the better the discriminator is at predicting the sensitive attribute S, the worse the sanitizer is at hiding it and thus the higher the potential risk of discrimination.

These two steps are run iteratively until convergence of the training.

Training objective of GANSan .

Let $\bar{\textit{S}{}}$ be the prediction of S by the discriminator ( $\bar{\textit{S}{}}=D_{isc}(S_{an}(D))$ ). Its objective is to accurately predict S, thus it aims at minimizing the loss $J^{D_{isc}}(S,\bar{\textit{S}{}})=d_{disc}(\textit{S}{},\bar{\textit{S}{}})$ . In practice in our work, we instantiate $d_{disc}$ as the Mean Squared Error (MSE).

Given an hyperparameter $\alpha$ representing the desired trade-off between the fairness and the fidelity, the sanitizer minimizes a loss combining two objectives:

[TABLE]

in which $d_{s}$ is $\frac{1}{2}-BER(D_{isc}(\textit{A}{},Y{}),\textit{s}{})$ on the sensitive attribute. The term $\frac{1}{2}$ is due to the objective of maximizing the error of the discriminator (i.e., recall that the optimal value of the BER is $0.5$ ).

Concerning the reconstruction loss $d_{r}$ , we have first tried the classical Mean Absolute Error (MAE) and MSE losses. However, our initial experiments have shown that these losses produce datasets that are highly problematic in the sense that the sanitizer always outputs the same profile whatever the input profile, which protects against attribute inference but renders the profile unusable. Therefore, we had to design a slightly more complex loss function. More precisely, we chose not to merge the respective losses of these attributes ( $e_{\textit{A}{}_{i}}=(1-\alpha)*|\textit{A}{}_{i}-\bar{\textit{A}{}_{i}}|;\quad\bar{A}_{i}\in\bar{A},i\in[1,d]$ ), yielding a vector of attribute losses whose components are iteratively used in the gradient descent.

Hence, each node of the output layer of the generator is optimized to reconstruct a single attribute from the representation obtained from the intermediate layers. The vector formulation of the loss is as follows: $\vec{J}^{San}=(e_{A_{1}},e_{A_{2}},e_{A_{3}},...,e_{A_{d}},e_{Y},\alpha*d_{s}(\textit{S}{},\bar{\textit{S}{}}))^{T}$ and the objective is to minimize all its components.

The details of the parameters used for the training are given in Appendices A and B.

4.2. Performance metrics

The performance of GANSan will be evaluated by taking into account the fairness enhancement and the fidelity to the original data. With respect to fairness, we will quantify it primarily with the inability of a predictor $Adv$ , hereafter referred to as the adversary, in inferring the sensitive attribute (cf. Section 2) using its Balanced Error Rate (BER) (Feldman et al., 2015) and its accuracy sAcc (cf., Section 2.2). We will also assess the fairness using metrics (cf. Section 2) such as demographic parity (Equation 1) and equalized odds (Equation 2).

To measure the fidelity $fid$ between the original and the sanitized data, we have to rely on a notion of distance. More precisely, our approach does not require any specific assumption on the distance used, although it is conceivable that it may work better with some than others. For the rest of this work, we will instantiate $fid$ by the $L_{2}$ -norm as it does not differentiate between attributes.

Note however that a high fidelity is a necessary but not a sufficient condition to imply a good reconstruction of the dataset. In fact, as mentioned previously early experiments showed that the sanitizer might find a “median” profile to which it will map all input profiles. Thus, to quantify the ability of the sanitizer to preserve the diversity of the dataset, we introduce the diversity measure, which is defined in the following way :

[TABLE]

While $fid{}$ quantifies how different the original and the sanitized datasets are, the diversity measures how diverse the profiles are in each dataset. We will also provide a qualitative discussion of the amount of damage for a given fidelity and fairness to provide a better understanding of the qualitative meaning of the fidelity.

Finally, we evaluate the loss of utility induced by the sanitization by relying on the accuracy $yAcc{}$ of prediction on a classification task. More precisely, the difference in $yAcc{}$ between a classifier trained on the original data and one trained on the sanitized data can be used as a measure of the loss of utility introduced by the sanitization with respect to the classification task.

5. Experimental evaluation

In this section, we describe the experimental setting used to evaluate GANSan as well as the results obtained.

5.1. Experimental setting

Dataset description.

We have evaluated our approach on two datasets that are classical in the fairness litterature, namely the Adult Census Income as well as on German Credit. Both are available on the UCI repository222https://archive.ics.uci.edu/ml/index.php. Adult Census reports the financial situation of individuals, with 45222 records after the removal of rows with empty values. Each record is characterized by 15 attributes among which we selected the gender (i.e., male or female) as the sensitive one and the income level (i.e., over or below 50K $) as the decision. German Credit is composed of 1000 applicants to a credit loan, described by 21 of their banking characteristics. Previous work (Kamiran and Calders, [2009](#bib.bib21)) have found that using the *age* as the sensitive attribute by binarizing it with a threshold of$ 25 $years to differentiate between old and young yields the maximum discrimination based on$ DemoParity$. In this dataset, the decision attribute is the quality of the customer with respect to his credit score (i.e., good or bad). Due to lack of space, we will mostly discuss the results on Adult dataset in this section. However, the results obtained on German credit were quite similar.

Training process.

We will evaluate GANSan using metrics among which the fidelity $fid{}$ , the $BER{}$ as well as the demographic parity $DemoParity{}$ (cf. Section 4.2). For this, we have conducted a $10$ -fold cross-validation during which the dataset is divided into ten blocks. During each fold, 8 blocks are used for the training, while another one is retained as the validation set and the last one as the test set.

We computed the $BER{}$ and $sAcc{}$ using the internal discriminator of GANSan and three external classifiers independent of the GANSan framework, namely Support Vector Machines (SVM) (Cortes and Vapnik, 1995), Multilayer Perceptron (MLP) (Popescu et al., 2009) and Gradient Boosting (GB) (Friedman, 2002). For all these external classifiers and all epochs, we report the space of achievable points with respect to the fidelity/fairness trade-off. Note that most approaches described in the related work (cf. Section 3) do not validate their results with independent external classifiers trained outside of the sanitization procedure.

The fact that we rely on three different family of classifiers is not fullproof, in the sense that it might exist another classifiers that we have not tested that can do better, but it provides a higher confidence on the strength of the sanitization than simply relying on the internal discriminator.

For each fold and each value of $\alpha$ , we train the sanitizer during $40$ epochs. At the end of each epoch, we save the state of the sanitizer and generate a sanitized dataset on which we compute the $BER{}$ , $sAcc{}$ and $fid{}$ . Afterwards, $HeuristicA{}$ is used to select the sanitized dataset that is closest to the “ideal point” ( $BER{}=0.5,fid{}=1$ ). More precisely, $HeuristicA{}$ is defined as follows: $Best_{Epoch}=min\{(BER_{min}-\dfrac{1}{2})^{2}+fid{}_{e},for\>e\in\{1,\ldots,MaxEpoch\}\}$ with $BER_{min}$ referring to the minimum value of $BER{}$ obtained with the external classifiers. For each value of $\alpha\in[0,1]$ , $HeuristicA{}$ selects among the sanitizers saved at the end of each epoch, the one achieving the highest fairness in terms of $BER{}$ for the lowest damage. We will use the three families of external classifiers for computing $yAcc{}$ , $DemoParity{}$ and $EqOddGap{}$ . We also used the same chosen test set to conduct a detailed analysis of its reconstruction’s quality ( $diversity{}$ and quantitative damage on attributes).

5.2. Evaluation scenarios

Recall that GANSan takes as input the whole original dataset (including the sensitive and the decision attributes) and outputs a sanitized dataset (without the sensitive attribute) in the same space as the original one, but from which it is impossible to infer the sensitive attribute. In this context, the overall performance of GANSan can be evaluated by analyzing the reachable space of points characterizing the trade-off between the fidelity $fid$ to the original dataset and the fairness enhancement. More precisely, during our experimental evaluation, we will measure the fidelity between the original and the sanitized data, as well as the $diversity{}$ , both in relation with the $BER{}$ and $sAcc{}$ , computed on this dataset.

However, in practice, our approach can be used in several situations that differ slightly from one another. In the following, we detail four scenarios that we believe as representing most of the possible use cases of GANSan . To ease the understanding, we will use the following notation: the subscript $tr$ (respectively $ts$ ) will denote the data in the training set (respectively test set). For instance, $\{Z\}_{tr}$ in which $Z$ can either be $A$ , $Y$ , $\bar{A}$ or $\bar{Y}$ , represents respectively the attributes of the original training set (not including the sensitive and the decision attributes), the decision in the original training set, the attributes the sanitized training set and the decision attribute in the sanitized training set. Table 2 describes the composition of the training and the testings sets for these four scenarios.

Scenario 1 : complete data debiasing.

This setting corresponds to the typical use of the sanitized dataset, which is the prediction of a decision attribute through a classifier. The decision attribute is also sanitized as we assumed that the original decision holds information about the sensitive attribute. Here, we quantify the accuracy of prediction of $\{\bar{Y}\}_{ts}$ as well as the discrimination represented by the demographic parity (Equation 1) and equalized odds (Equation 2).

Scenario 2 : partial data debiasing.

In this scenario, similarly to the previous one, the training and the test sets are sanitized with the exception that the sanitized decision in both these datasets $\{\bar{A},\bar{Y}\}$ is replaced with the original one $\{\bar{A},Y\}$ . This scenario is generally the one considered in the majority of paper on fairness enhancement (Zemel et al., 2013; Edwards and Storkey, 2015; Madras et al., 2018), the accuracy loss in the prediction of the original decision $\{Y\}_{ts}$ between this classifier and another trained on the original dataset without modifications $\{A\}_{tr}$ is a straightforward way to quantify the utility loss due to the sanitization.

Scenario 3 : building a fair classifier.

This scenario was considered in (Xu et al., 2018) and is motivated by the fact that the sanitized dataset might introduce some undesired perturbations (e.g., changing the education level from Bachelor to PhD).

Thus, a third party might build a fair classifier but still apply it directly on the unperturbed data to avoid the data sanitization process and the associated risks. More precisely in this scenario, a fair classifier is obtained by training it on the sanitized dataset $\{\bar{A}\}_{tr}$ to predict the sanitized decision $\{\bar{Y}\}_{tr}$ . Afterwards, this classifier is tested on the original data ( $\{A\}_{ts}$ ) by measuring its fairness through the demographic parity (Equation 1, Section 2). We also compute the accuracy of the fair classifier with respect to the original decision of the test set $\{Y\}_{ts}$ .

Scenario 4 : local sanitization.

The local sanitization scenario corresponds to local use of the sanitizer by the individual himself. For instance, the sanitizer could be used as part of a mobile phone application providing individuals with a mean to remove some sensitive attributes from their profile before disclosing it to an external entity. In this scenario, we assume the existence of a biased classifier, trained to predict the original decision $\{Y\}_{tr}$ on the original dataset $\{A\}_{tr}$ . The user has no control on this classifier, but he is allowed nonetheless to perform the sanitization locally on his profile before submitting it to the existing classifier similarly to the recruitment scenario discussed in the introduction. This classifier is applied on the sanitized test set $\{\bar{A}\}_{ts}$ and its accuracy is measured with respect to the original decision $\{Y\}_{ts}$ as well as its fairness quantified by $DemoParity{}$ .

5.3. Experimental results

General results on Adult.

Figure 2 describes the achievable trade-off between fairness and fidelity obtained on Adult. First, we can observe that fairness improves when $\alpha$ increased as expected. Even with $\alpha=0$ (i.e., maximum utility with no focus on the fairness), we cannot reach a perfect fidelity to the original data as we get at most $fid{}_{\alpha=0}\approx 0.982$ (cf. Figure 2). Increasing the value of $\alpha$ from [math] to a low value such as $0.2$ provides a fidelity close to the highest possible ( $fid{}_{\alpha=0.2}=0.98$ ), but leads to a BER that is poor (i.e., not higher than $0.2$ ). Nonetheless, we still have a fairness enhancement, compared to the original data ( $fid{}_{orig}=1$ , $BER\leq 0.15$ ).

At the other extreme in which $\alpha=1$ , the data is sanitized without any consideration on the fidelity. In this case, the $BER{}$ is optimal as expected and the fidelity is $10\%$ lower than the maximum achievable ( $fid{}_{\alpha=1}\approx 0.88$ ). However, slightly decreasing the value of $\alpha$ , such as setting $\alpha=0.96$ , allows the sanitizer to significantly remove the unwarranted correlations ( $BER{}\approx 0.45$ ) with a cost of $2.24\%$ on fidelity ( $fid{}_{\alpha=0.96}\approx 0.95$ ).

With respect to $sAcc{}$ , the accuracy drops significantly when the value of $\alpha$ increases 3. Here, the optimal value is the proportion of the majority class, which GANSan renders the accuracy of predicting S from the sanitized set closer to that value. However, even at the extreme $\alpha=1$ , it is nearly impossible to reach this optimal value. Similarly to BER, slightly decreasing $\alpha$ from this extreme value by setting $\alpha=0.85$ improves the sanitization while preserving a fidelity closer to the maximum achievable.

The quantitative analysis with respect to the diversity is shown in Figure 4. More precisely, the smallest drop of diversity obtained is $3.57\%$ , which is achieved when we set $\alpha\leq 0.2$ . Among all values of $\alpha$ , the biggest drop observed is $36\%$ . The application of GANSan , therefore introduces an irreversible perturbation as observed with the fidelity. This loss of diversity implies that the sanitization reinforces the similarity between sanitized profiles as $\alpha$ increases, rendering them almost identical or mapping the input profiles to a small number of stereotypes. When $\alpha$ is in the range $[0.98,1[$ (i.e., complete sanitization), $75\%$ of categorical attributes have a proportion of modified records between $10$ and $40\%$ (cf. Figure 4).

For numerical attributes, we compute the relative change (RC) normalized by the mean of the original and sanitized values:

[TABLE]

We normalize the RC using the mean (since all values are positives) as it allows us to handle situations in which the original values are equal to [math]. With the exception of the extreme sanitization ( $\alpha=1$ ), at least $70\%$ of records in the dataset have a relative change lower than $0.25$ for most of the numerical attributes. Selecting $\alpha=0.9875\geq 0.98$ leads to $80\%$ of records being modified with a relative change less than $0.5$ (c.f. Figure 9 in appendix C).

General results on German.

Similarly to Adult, the protection increases with $\alpha$ . More precisely $\alpha=0$ (maximum reconstruction) achieves a fidelity of almost $0.96$ . The maximum protection of $BER{}=0.5$ corresponds to a fidelity of $0.81$ and a sensitive accuracy value of $sAcc{}=0.76$ .

We can observe on Figure6 that most values are concentrated on the $0.76$ plateau, regardless of the fidelity and the value of $\alpha$ . We believe this is due to the high disparity of the dataset. The fairness on German credit is initially quite high, being close to $0.33$ . Nonetheless, we can observe three interesting trade-offs on Figure5, each located at a different shoulder of the Pareto front. These trade-offs are A ( $BER{}\approx 0.43,fid{}\approx 0.94$ ), B ( $BER{}\approx 0.45,fid{}\approx 0.84$ ) and C ( $BER{}\approx 0.5,fid{}\approx 0.81$ ), each achievable with $\alpha=0.6$ for the first one, and $\alpha=0.9968$ for the rest.

We review the diversity and the sanitization induced damage on categorical attributes in Figure 7. As expected, the diversity decreases with alpha, rendering most profiles identical with $\alpha=1$ . We can also observe some instabilities higher $\alpha$ values produce a shallow range of diversities (i.e $\alpha\geq 0.9$ ) while smaller values have a higher range of diversities. Such instability is mainly explained by the size and the imbalance of the dataset, which does not allow the sanitizer to correctly learn the distribution (such phenomenon is common when training GANs with a small dataset). Nonetheless, most of the diversity results prove close to the original one, that is $0.51$ . The same trend is observed on the categorical attribute damage. For most values of $\alpha$ , the median damage is below or equal to $20\%$ , meaning that we have to modify only two categorical columns in a record to prevent remove unwanted correlations. For the numerical damage, most columns have a relative change lower than $0.5$ for more than $70\%$ of the dataset, regardless of the value of $\alpha$ . Only columns Duration in month and Credit amount have a higher damage. This is due to the fact that these columns have a very large range of possible values compare to the other columns ( $33$ and $921$ ), especially for column Credit amount which also exhibit a nearly uniform distribution. Our reference points A, B and C have a median damage close than $10\%$ for A and $20\%$ for both B and C. The damage on categorical columns are also acceptable.

To summarize our results, GANSan is able to maintain an important part of the dataset structure despite sanitization, making it usable for other analysis tasks. However, at the individual level, some perturbation might be more important on some profiles than on others. A future work will investigate the relationship between the position of the profile in distributions and the damage introduced. For the different scenarios investigated hereafter, we fixed the value of $\alpha$ to $0.9875$ , which provides nearly a perfect level of sensitive attribute protection while leading to an acceptable damage on Adult. Due to space limitations, we will not discussed results obtained on German, the scenario analysis are available on the appendices D.2.

Scenario 1 : complete data debiasing.

In this scenario, we observe that GANSan preserves the accuracy of the dataset. More precisely, it increases the accuracy of the decision prediction on the sanitized dataset for all classifiers (cf. Figure 8, Scenario S1), compared to the original one which is $0.86$ , $0.84$ and $0.78$ respectively for GB, MLP and SVM. This increase can be explained by the fact that GANSan modifies the profiles to make them more coherent with the associated decision, by removing correlations between the sensitive attribute and the decision one. As a consequence, this sets the same decision to similar profiles in both the protected and the default groups. In fact, nearly the same distributions of decision attribute are observed before and after the sanitization but some record’s decisions are shifted ( $7.56\%\pm 1.23\%$ of decision shifted in the sanitized whole set, $11.44\%\pm 2.74\%$ of decision shifted in the sanitized sensitive group for $\alpha=0.9875$ ). Such decision shift could be explained by the similarity between those profiles to others with the opposite decisions in the original dataset.

We also believe that the increase of accuracy is correlated with the drop of diversity. More precisely, if profiles become similar to each other, the decision boundary might be easier to find.

The discrimination is reduced as observed through $DemoParity{}$ , $EqOddGap_{1}$ and $EqOddGap_{0}$ , which all exhibit a negative slope. When correlations with the sensitive attribute are significantly removed ( $\alpha\geqslant 0.6$ ), those metrics also significantly decrease. For instance, at $\alpha=0.9875$ , $BER{}\geq 0.48$ , $yAcc{}=0.96$ , $DemoParity{}=0.0453$ , $EqOddGap_{1}=0.0286$ $EqOddGap_{0}=0.0062$ for GB; whereas as the original demographic parity gap and equalised odds gap are respectively $DemoParity{}=0.16$ , $EqOddGap_{1}=0.083$ $EqOddGap_{0}=0.060$ (cf., Tables 5 and 6 in appendices for more details). In this setup, FairGan (Xu et al., 2018) achieves a BER of $0.3862\pm 0036$ an accuracy of $0.82\pm 0.01$ and a demographic parity of $0.04\pm 0.02$ .

Scenario 2 : partial data debiasing.

Somewhat surprisingly, we observe an increase in accuracy for most values of alpha. The demographic parity also decreases while the equalized odds remains nearly constant ( $EqOddGap_{1}$ , green line on Figure 8). Table 3 compare the results obtained to other existing work from the state-of-the-art. We include the classifier with the highest accuracy (MLP) and the one with the lowest one (SVM). From these results, we can observe that our method outperforms the others in terms of accuracy, but that the lowest demographic parity is achieved with the method proposed in (Zhang et al., 2018) ( $DemoParity{}=0.01$ ), which is not surprising as this method is precisely tailored to reduce this metric.

Even though our method is not specifically constrained to mitigate the demographic parity, we can observe that it significantly improve it. Thus, while partial data debiasing is not the best application scenario for our approach as the original decision might be correlated with the sensitive attribute, it still mitigates its effect to some extent.

Scenario 3 : building a fair classifier.

The sanitizer helps to reduce discrimination based on the sensitive attribute, even when using the original data on a classifier trained on the sanitized one. As presented in the third row of Figure 8, as we force the system to completely remove the unwarranted correlations, the discrimination observed when classifying the original unperturbed data is reduced. On the other hand, the accuracy exhibits here the highest negative slope with respect to all the scenarios investigated. More precisely, we observe a drop of $16\%$ for the best classifier in terms of accuracy on the original set, which can be explained by the difference of correlations between $A$ and $Y$ and between $\bar{A}$ and $\bar{Y}$ . As the fair classifiers are trained on the sanitized set ( $\bar{A}$ and $\bar{Y}$ ), the decision boundary obtained is not relevant for $A$ and $Y$ .

FairGan (Xu et al., 2018), which also investigated this scenario achieved $yAcc{}=0.82$ and $DemoParity{}=0.05\pm 0.01$ whereas our best classifier in accuracy (GB) achieves $yAcc{}=0.72\pm 0.033$ and $DemoParity{}=0.12\pm 0.06$ for $\alpha=0.9875$ .

Scenario 4 : local sanitization.

On this setup, we observe that the discrimination is lowered as the $\alpha$ coefficient increases. Similarly to other scenarios, the more the correlations with the sensitive attribute are removed, the higher the drop of discrimination as quantified by the $DemoParity{}$ , $EqOddGap_{1}$ as well as $EqOddGap_{0}$ , and the lower the accuracy on the original decision attribute. For instance, with GB, we obtain $yAcc{}=0.83\pm 0.039$ , $DemoParity{}=0.035\pm 0.022$ at $\alpha=0.9875$ (the original values were $yAcc{}=0.86$ and $DemoParity{}=0.16$ ). With MLP which has the best DemoParity, we observe: $yAcc{}=0.77\pm 0.060$ , $DemoParity{}=0.025\pm 0.017$ This proves that GANSan can be used locally, thus allowing users to contribute to large datasets by sanitizing and sharing their information without relying on any third party, with the guarantee that the sensitive attribute GANSan has been trained for is removed.

The drop of accuracy due to the local sanitization is $3.68\%$ on GB ( $8\%$ with MLP). Thus, for application requiring a time-consuming training phase, using GANSan to sanitize profiles without retraining the classifier seems to be a good compromise.

6. Conclusion

In this work, we have introduced GANSan , a novel sanitization method inspired by GANs achieving fairness by removing the correlations between the sensitive attribute and the other attributes of the profile. Our experiments demonstrate that GANSan can prevent the inference of the sensitive attribute while limiting the loss of utility as measured in terms of the accuracy of a classifier learned on the sanitized data as well as by the damage on the numerical and categorical attributes. In addition, one of the strengths of our approach is that it offers the possibility of local sanitization, by only modifying the attributes as little as possible while preserving the space of the original data (thus preserving interpretability). As a consequence, GANSan is agnostic to subsequent use of data as the sanitized data is not tied to a particular task.

While we have relied on three different types of external classifiers for capturing the difficulty to infer the sensitive attribute from the sanitized data, it is still possible that a more powerful classifier exists that could infer the sensitive attribute with higher accuracy. Note that this is an inherent limitation of all the preprocessing techniques and not only our approach. Nonetheless, as future work we would like to investigate other families of learning algorithms to complete the range of external classifiers.

Finally, much work still needs to be done to assess the relationship between the different fairness notions, namely the impossibility of inference and the individual and group fairness.

Appendices

Appendix A Preprocessing of datasets

The preprocessing step consists in first in one-hot encoding categorical and numerical attributes with less than 5 values followed with a scaling between [math] and $1$ .

In addition on Adult dataset, we need to apply a logarithm on columns $capital-gain$ and $capital-loss$ prior any step because of the fact that those attributes exhibit a distribution close to a Dirac delta (Dirac, 1981), with the maximal values being respectively $9999$ and $4356$ , and a median of [math] for both (respectively $91\%$ and $95\%$ of records have a value of [math]). Since most values are equal to [math], the sanitizer will always nullify both attributes and the approach will not converge. Afterwards, a postprocessing step consisting of reversing the preprocessing ones is performed in order to remap the generated data to the original shape.

Appendix B Hyper-parameters tuning

Table 4 details the parameters of the classifiers that have yielded the best results respectively on the Adult and German credit datasets. The training rate represents the number of time for which an instance is trained during a single iteration. For instance, for an iteration $i$ , the discriminator is trained with $100*50=5000$ records while the sanitizer is trained with $1*100=100$ records. The number of iterations is equal to: $iterations=\dfrac{datasetsize}{batchsize}$ . Our experiments were run for a total of 40 epochs. We varied the $\alpha$ value using a geometric progression: $\alpha_{i}=0.2+0.4\dfrac{2^{i}-1}{2^{i-1}};$ $i\in\{1,..,10\}$

Appendix C Evaluation of Adult

This appendix is composed of supplementary results of the evaluation of the Adult dataset.

C.1. Numerical attribute Damage

Figure 9 summarizes the numerical damage on Adult computed with the formula detailed in Appendix LABEL:app:relC.

C.2. Evaluation of group-based discrimination

Table 5 summarizes the results obtained in terms of discrimination, and table 6 presents the sensitive attribute level for all classifiers. These results are computed with $\alpha=0.9875$ .

C.3. Utility of GANSan

We present in Table 7 the utility of GANSan as measured in terms of the accuracy on the decision prediction, the fidelity and the diversity on Adult.

C.4. Qualitative observation of GANSan output on Adult

In Tables 8 and 9, we present the records that have been maximally and minimally damaged due to the sanitization.

Appendix D Evaluation of German credit

In this appendix, we will discuss the results obtained on German credit dataset. First, we present the dataset distribution (Table 10).

D.1. Damage and qualitative analysis

Looking at the numerical columns damage (Figure10), we can observe that most columns have a relative change lower than $0.5$ for more than $70\%$ of the dataset, regardless of the value of $\alpha$ . Only columns Duration in month and Credit amount have a higher damage. This is due to the fact that these columns have a very large range of possible values compare to the other columns ( $33$ and $921$ ), especially for column Credit amount which also exhibit a nearly uniform distribution.

D.2. Evaluation scenario, other fairness metrics and utilities

In Figure11, we present the results on the differents scenario investigated (S1: complete debiasing, S2: partial debiasing, S3: buiding a fair classifier, S4: local sanitization).

First off all, we can observe that for all scenario, the accuracy is mostly stable with the increase of $\alpha$ for all classifiers. The sanitization does not significantly affect the quality of prediction, which is mostly around $75\%$ , $7.143\%$ greater than the proportion of the positive outcomes in the dataset. On scenario S3 and S4 This observation comes into contrast to adult, where the accuracy decrease with the increase of the protection coefficient.

If we take a closer look at the fairness metrics as provided in Figure12, we observe that the DemoParity and $EqOddGap_{1}$ have a negative slope, which increase with $\alpha$ . In constrast, $EqOddGap_{0}$ is rather unstable, especially when $\alpha>0.8$ .

S1: complete data debiasing

In this scenario, we observe that the sanitization makes render the profiles in each decision group easily seperable, which in turn improve the accuracy as we can observe. The sanitization also reduces the risk of discrimination, just as we have seen on the Adult dataset.

S2: partial data debiasing

: Even though the sanitized and original decisions does not share the same distribution, the sanitization is able to transform the dataset in such way that it improves the classifications performances of all classifiers. The discrimination on the other hand is almost constant, meaning that the original decision still preserve a certain amount of discrimination that is harder to remove on not sensitive attributes alone, especially on small dataset.

S3: building a fair classifier

: Just as the results observed on Adult dataset, building a fair classifier by training it on sanitized data and testing/using it on unprocessed data proved to be less conservative of the accuracy. We observe a slight drop, from $0.75$ to almost $0.65$ for the first value of $\alpha$ , then it stays stable across all $\alpha$ . There decision boundaries learned by the fair classifier can not be directly transfered on another type of data as they do not share the same distribution. Concerning the fairness metrics, we observe two behaviour: for $\alpha\leq 0.9$ the fairness metrics are nearly constants in constrast to adult where they all seems to increase; the discrimination is reduced when we push the system close to the maximum ( $\alpha>0.9$ ), but not to the extreme where the discrimination increases. The increase for extrême values is due to the fact that the sanitization has almost completely perturbed the dataset, losing all of its structure. Thus, as this set is highly imbalanced both on the sensitive attribute distribution as well as the decision one, all classifiers to predict the majority labels which are in favor of the default group (c.f. Table 10).

S4: local sanitization

On this dataset, this scenario provides the most significant results. As a matter of fact, the accuracy of the classifier is almost not affected by the sanitization, while the discrimination is reduced. MLP provides the most unstable $EqOddGap_{0}$ , and for all classifiers, we observe a reduction of $DemoParity{}$ and $EqOddGap_{0}$ , which become significantly important with higher values of $\alpha$ ( $\alpha>0.8$ ). This result is differents from Adult, where we observed a negative slope. An deeper analysis of such behaviour is left out as another research objective.

Table 12 provides the quantitative results for these 4 scenario on values of $\alpha$ that correspond to points A and B.

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Arvind (2018) Narayanan Arvind. 2018. 21 Fairness Definitions and Their Politics. Tutorial presented at the Conference on Fairness, Accountability, and Transparency (2018).
3Barocas and Selbst (2016) Solon Barocas and Andrew D Selbst. 2016. Big data’s disparate impact. Cal. L. Rev. 104 (2016), 671.
4Berk et al . ([n.d.]) Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. [n.d.]. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research ([n. d.]), 0049124118782533.
5Beutel et al . (2017) Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H Chi. 2017. Data decisions and theoretical implications when adversarially learning fair representations. Fairness, Accountability, and Transparency in Machine Learning (2017).
6Center (2016) Electronic Privacy Information Center. 2016. EPIC - Algorithms in the Criminal Justice System. https://epic.org/algorithmic-transparency/crim-justice/
7Corbett-Davies et al . (2017) Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. 2017. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 797–806.
8Cortes and Vapnik (1995) Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273–297.