An Algorithmic Framework for Fairness Elicitation

Christopher Jung; Michael Kearns; Seth Neel; Aaron Roth; Logan; Stapleton; Zhiwei Steven Wu

arXiv:1905.10660·cs.LG·October 15, 2020

An Algorithmic Framework for Fairness Elicitation

Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, Logan, Stapleton, Zhiwei Steven Wu

PDF

1 Repo

TL;DR

This paper introduces a flexible algorithmic framework for eliciting complex fairness constraints from stakeholders and integrating them into model training, with theoretical guarantees and preliminary behavioral study results.

Contribution

It presents a provably convergent algorithm for learning models under elicited fairness constraints, accommodating nuanced fairness notions beyond traditional definitions.

Findings

01

Algorithm is provably convergent and oracle efficient.

02

Framework can incorporate traditional and elicited fairness constraints.

03

Preliminary behavioral study on COMPAS dataset supports feasibility.

Abstract

We consider settings in which the right notion of fairness is not captured by simple mathematical definitions (such as equality of error rates across groups), but might be more complex and nuanced and thus require elicitation from individual or collective stakeholders. We introduce a framework in which pairs of individuals can be identified as requiring (approximately) equal treatment under a learned model, or requiring ordered treatment such as "applicant Alice should be at least as likely to receive a loan as applicant Bob". We provide a provably convergent and oracle efficient algorithm for learning the most accurate model subject to the elicited fairness constraints, and prove generalization bounds for both accuracy and fairness. This algorithm can also combine the elicited constraints with traditional statistical fairness notions, thus "correcting" or modifying the latter by the…

Equations241

D \in Δ H, α_{ij} \geq 0 min

D \in Δ H, α_{ij} \geq 0 min

such that \forall (i, j) \in [n]^{2} :

(i, j) \in [n]^{2} \sum \frac{w ^ _{ij} α _{ij}}{∣ A ∣} \leq η .

(i, j) \in [n]^{2} \sum \frac{w ^ _{ij} E _{h \sim D} [ h ( x _{i} ) - h ( x _{j} )]}{∣ A ∣} > η,

(i, j) \in [n]^{2} \sum \frac{w ^ _{ij} E _{h \sim D} [ h ( x _{i} ) - h ( x _{j} )]}{∣ A ∣} > η,

(\overset{w}{^}_{b s} E [h (b) - h (s)] + \overset{w}{^}_{m s} E [h (m) - h (s)]) /∣ A ∣

(\overset{w}{^}_{b s} E [h (b) - h (s)] + \overset{w}{^}_{m s} E [h (m) - h (s)]) /∣ A ∣

= (0.5 (0.8 - 0.5) + 0.7 (0.9 - 0.5)) /6 \approx 0.071 > 0.07 = η .

Π_{D, w, γ} ((x, x^{'})) = w_{x, x^{'}} max (0, E_{h \sim D} [h (x) - h (x^{'})] - γ)

Π_{D, w, γ} ((x, x^{'})) = w_{x, x^{'}} max (0, E_{h \sim D} [h (x) - h (x^{'})] - γ)

Π_{D, w, γ} (M) = \frac{1}{∣ M ∣} (x, x^{'}) \in M \sum Π_{D, w, γ} ((x, x^{'}))

Π_{D, w, γ} (M) = \frac{1}{∣ M ∣} (x, x^{'}) \in M \sum Π_{D, w, γ} ((x, x^{'}))

er r (\hat{D}, S) \leq (D, α) \in Ω (S, \overset{w}{^}, γ, η) min err (D, S) + 2 ν .

er r (\hat{D}, S) \leq (D, α) \in Ω (S, \overset{w}{^}, γ, η) min err (D, S) + 2 ν .

L (D, α, λ, τ) = err (D, S)

L (D, α, λ, τ) = err (D, S)

+ τ \frac{1}{∣ A ∣} (i, j) \in [n]^{2} \sum w_{ij} α_{ij} - η

(D, α) \in Ω (S, \overset{w}{^}, γ, η) argmin err (D, S) = D \in Δ H, α \in [0, 1]^{n^{2}} argmin λ \in R^{n^{2}}, τ \in R max L (D, α, λ, τ)

(D, α) \in Ω (S, \overset{w}{^}, γ, η) argmin err (D, S) = D \in Δ H, α \in [0, 1]^{n^{2}} argmin λ \in R^{n^{2}}, τ \in R max L (D, α, λ, τ)

D \in Δ H, α \in [0, 1]^{n^{2}} min λ \in R^{n^{2}}, τ \in R max L (D, α, λ, τ) = λ \in R^{n^{2}}, τ \in R max D \in Δ H, α \in [0, 1]^{n^{2}} min L (D, α, λ, τ) .

D \in Δ H, α \in [0, 1]^{n^{2}} min λ \in R^{n^{2}}, τ \in R max L (D, α, λ, τ) = λ \in R^{n^{2}}, τ \in R max D \in Δ H, α \in [0, 1]^{n^{2}} min L (D, α, λ, τ) .

c_{i}^{0} = \frac{1}{n} E_{h \sim D} [\mathbbm 1 (y_{i} \neq = 0)]

c_{i}^{0} = \frac{1}{n} E_{h \sim D} [\mathbbm 1 (y_{i} \neq = 0)]

D, α argmin L (D, α, λ, τ) = D argmin L_{λ, τ}^{ρ_{1}} (D) \times α argmin L_{λ, τ}^{ρ_{2}} (α),

D, α argmin L (D, α, λ, τ) = D argmin L_{λ, τ}^{ρ_{1}} (D) \times α argmin L_{λ, τ}^{ρ_{2}} (α),

L_{λ, τ}^{ρ_{1}} (D) = err (h, D) + (i, j) \in [n]^{2} \sum λ_{ij} h \sim D E [h (x_{i}) - h (x_{j})]

L_{λ, τ}^{ρ_{1}} (D) = err (h, D) + (i, j) \in [n]^{2} \sum λ_{ij} h \sim D E [h (x_{i}) - h (x_{j})]

L_{λ, τ}^{ρ_{2}} (α) = (i, j) \in [n]^{2} \sum λ_{ij} (- α_{ij}) + τ \frac{1}{∣ A ∣} (i, j) \in [n]^{2} \sum w_{ij} α_{ij}

L_{λ, τ}^{ρ_{2}} (α) = (i, j) \in [n]^{2} \sum λ_{ij} (- α_{ij}) + τ \frac{1}{∣ A ∣} (i, j) \in [n]^{2} \sum w_{ij} α_{ij}

α argmin L_{λ, τ}^{ρ_{2}}

α argmin L_{λ, τ}^{ρ_{2}}

= α argmin (i, j) \in [n]^{2} \sum - λ_{ij} α_{ij} + (i, j) \in [n]^{2} \sum τ \frac{w _{ij}}{∣ A ∣} α_{ij}

= α argmin (i, j) \in [n]^{2} \sum α_{ij} (τ \frac{w _{ij}}{∣ A ∣} - λ_{ij}) .

D argmin L_{λ, τ}^{ρ_{1}}

D argmin L_{λ, τ}^{ρ_{1}}

= D argmin err (D, S) + (i, j) \in [n]^{2} \sum λ_{ij} h \sim D E [h (x_{i}) - h (x_{j})]

= D argmin \frac{1}{n} i = 1 \sum n E_{h \sim D} [\mathbbm 1 (h (x_{i}) \neq = y_{i})] + (i, j) \in [n]^{2} \sum λ_{ij} h \sim D E [h (x_{i}) - h (x_{j})]

= D argmin i = 1 \sum n \frac{1}{n} E_{h \sim D} [\mathbbm 1 (h (x_{i}) \neq = y_{i})] + j \neq = i \sum λ_{ij} h (x_{i}) - j \neq = i \sum λ_{j i} h (x_{i})

= D argmin i = 1 \sum n \frac{1}{n} E_{h \sim D} [\mathbbm 1 (h (x_{i}) \neq = y_{i})] + j \neq = i \sum h (x_{i}) (λ_{ij} - λ_{j i}) .

c_{i}^{h (x_{i})} = \frac{1}{n} E_{h \sim D} [\mathbbm 1 (h (x_{i}) \neq = y_{i})] + h (x_{i}) (λ_{ij} - λ_{j i}) .

c_{i}^{h (x_{i})} = \frac{1}{n} E_{h \sim D} [\mathbbm 1 (h (x_{i}) \neq = y_{i})] + h (x_{i}) (λ_{ij} - λ_{j i}) .

c_{i}^{h (x_{i})} = c_{i}^{0}

c_{i}^{h (x_{i})} = c_{i}^{0}

= \frac{1}{n} \cdot 1 + j \neq = i \sum 0 \cdot (λ_{ij} - λ_{j i}) = \frac{1}{n}

Λ = {λ \in R_{+}^{n^{2}} : ∥ λ ∥_{1} \leq C_{λ}}, T = {τ \in R_{+} : ∥ τ ∥_{1} \leq C_{τ}} .

Λ = {λ \in R_{+}^{n^{2}} : ∥ λ ∥_{1} \leq C_{λ}}, T = {τ \in R_{+} : ∥ τ ∥_{1} \leq C_{τ}} .

r_{λ} (λ^{t}) = (i, j) \in [n]^{2} \sum λ_{ij}^{t} (E_{h \sim D} [h (x_{i}) - h (x_{j})] - α_{ij} - γ)

r_{λ} (λ^{t}) = (i, j) \in [n]^{2} \sum λ_{ij}^{t} (E_{h \sim D} [h (x_{i}) - h (x_{j})] - α_{ij} - γ)

r_{λ} (τ^{t}) = τ^{t} \frac{1}{∣ A ∣} (i, j) \in [n]^{2} \sum w_{ij} α_{ij} - η .

r_{λ} (τ^{t}) = τ^{t} \frac{1}{∣ A ∣} (i, j) \in [n]^{2} \sum w_{ij} α_{ij} - η .

λ \in Λ, τ \in T argmax L (D, α, λ, τ) = λ \in Λ argmax L_{D, α}^{ψ_{1}} (λ) \times τ \in T argmax L_{D, α}^{ψ_{2}} (τ),

λ \in Λ, τ \in T argmax L (D, α, λ, τ) = λ \in Λ argmax L_{D, α}^{ψ_{1}} (λ) \times τ \in T argmax L_{D, α}^{ψ_{2}} (τ),

L_{D, α}^{ψ_{1}} (λ) = (i, j) \in [n]^{2} \sum λ_{ij} (E_{h \sim D} [h (x_{i}) - h (x_{j})] - α_{ij} - γ)

L_{D, α}^{ψ_{1}} (λ) = (i, j) \in [n]^{2} \sum λ_{ij} (E_{h \sim D} [h (x_{i}) - h (x_{j})] - α_{ij} - γ)

L_{D, α}^{ψ_{2}} (τ) = τ \frac{1}{∣ A ∣} (i, j) \in [n]^{2} \sum w_{ij} α_{ij} - η .

L_{D, α}^{ψ_{2}} (τ) = τ \frac{1}{∣ A ∣} (i, j) \in [n]^{2} \sum w_{ij} α_{ij} - η .

τ \in T max t = 1 \sum T L_{D^{t}, α^{t}}^{ψ_{2}} (τ) - t = 1 \sum T L_{D^{t}, α^{t}}^{ψ_{2}} (τ^{t}) \leq C_{τ} T .

τ \in T max t = 1 \sum T L_{D^{t}, α^{t}}^{ψ_{2}} (τ) - t = 1 \sum T L_{D^{t}, α^{t}}^{ψ_{2}} (τ^{t}) \leq C_{τ} T .

τ^{t} = p r o j_{[0, C_{τ}]} (τ^{t - 1} + μ_{τ}^{t} (\frac{1}{W} ij \sum w_{ij} α_{ij}^{t - 1} - η)) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jjgrimm/FairnessReferenceSheet
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

An Algorithmic Framework for Fairness Elicitation

Christopher Jung, Michael Kearns††footnotemark: , Seth Neel††footnotemark: ,

Aaron Roth††footnotemark: , Logan Stapleton, Zhiwei Steven Wu University of PennsylvaniaUniversity Of MinnesotaCarnegie Mellon University

Abstract

We consider settings in which the right notion of fairness is not captured by simple mathematical definitions (such as equality of error rates across groups), but might be more complex and nuanced and thus require elicitation from individual or collective stakeholders. We introduce a framework in which pairs of individuals can be identified as requiring (approximately) equal treatment under a learned model, or requiring ordered treatment such as “applicant Alice should be at least as likely to receive a loan as applicant Bob”. We provide a provably convergent and oracle efficient algorithm for learning the most accurate model subject to the elicited fairness constraints, and prove generalization bounds for both accuracy and fairness. This algorithm can also combine the elicited constraints with traditional statistical fairness notions, thus “correcting” or modifying the latter by the former. We report preliminary findings of a behavioral study of our framework using human-subject fairness constraints elicited on the COMPAS criminal recidivism dataset.

1 Introduction

The literature on algorithmic fairness has consisted largely of researchers proposing and showing how to impose technical definitions of fairness [8, 18, 39, 2, 3, 21, 36, 29, 16, 14, 3, 6]. Because these imposed notions of fairness are described analytically, they are typically simplistic, and often have the form of equalizing simple error statistics across groups. Our starting point is the observation that:

This process cannot result in notions of fairness that do not have any simple, analytic description, and 2. 2.

This process also overlooks a more precursory problem: namely, who gets to define what is fair?

It’s unlikely that researchers alone are best fit for defining algorithmic fairness. Recent work identifies undue power imbalances [25] and biases [24] that arise when algorithm designers and researchers are the only voices in conversations around ethical design. Veale et al. [35] find that many machine learning practitioners are disconnected from the “organisational and institutional realities, constraints and needs” specific to the contexts in which their algorithms are applied. Researchers may not be able to propose a concise technical definition, e.g. statistical parity, to capture the nuances of fairness in any given context. Furthermore, many philosophers hold that stakeholders who are affected by moral decisions and experts who understand the context in which moral decisions are made will have the best judgment about which decisions are fair in that context [37, 24].

To this end, we aim to allow stakeholders and experts to play a central role in the process of defining algorithmic fairness. This is aligned with recent work on virtual democracy, which propose and enact participatory methods to automate moral decision-making [5, 31, 17, 28, 11].

The way we involve stakeholders is motivated by two concerns:

We want stakeholders to have free rein over how they may define fairness, e.g. we don’t want to simply have them vote on whether existing, simple constraints like statistical parity or equalized odds is best; and 2. 2.

We want non-technical stakeholders to be able to contribute, even if they may not understand the inner workings of a learning algorithm.

We hold that people often cannot elucidate their conceptions of fairness; yet, they can identify specific scenarios where fairness or unfairness occurs.111This is philosophically akin to a theory of moral epistemology called moral perception, which claims that we know moral facts (e.g. goodness or fairness) via perception, as opposed to knowing them via rules of morality (see [4]). Drawing from individual notions of fairness like Dwork et al. [7], Joseph et al. [16] that are defined in terms of pairwise comparisons, we therefore aim to elicit stakeholders conceptions of fairness by asking them to compare pairs of individuals in specific scenarios. Specifically, we ask whether it’s fair that one particular individual should receive an outcome that is as desirable or better than the other.

When pointing out fairness or unfairness, this kind of pairwise ranking is natural. For example, after Serena Williams was penalized for a verbal interaction with an umpire in the 2018 U.S. Open Finals, tennis player James Blake tweeted, “I have said worse and not gotten penalized. And I’ve also been given a ‘soft warning’ by the ump where they tell you knock it off or I will have to give you a violation. [The umpire] should have at least given [Williams] that courtesy” [38]. Here, Blake thinks that: 1) Williams should have been judged as or less severely than he would have been in a similar situation; and 2) the umpire’s decision was unfair, because Williams was judged more severely.

Thus, we ask a set of stakeholders about a fixed set of pairs of individuals subject to a classification problem. For each pair of individuals $(A,B)$ , we ask the stakeholder to choose from amongst a set of four options:

Fair outcomes must classify $A$ and $B$ the same way (i.e. they must either both get a favorable classification or both get an unfavorable classification). 2. 2.

Fair outcomes must give $A$ an outcome that is equal to or preferable to the outcome of $B$ . 3. 3.

Fair outcomes must give $B$ an outcome that is equal to or preferable to the outcome of $A$ 4. 4.

Fair outcomes may treat $A$ and $B$ differently without any constraints.

These constraints, a data distribution, and a hypothesis class define a learning problem: minimize classification error subject to the constraint that the rate of violation of the elicited pairwise constraints is held below some fixed threshold. Crucially and intentionally we elicit relative pairwise orderings of outcomes (e.g. $A$ and $B$ should be treated equally), but do not elicit preferences for absolute outcomes (e.g. $A$ should receive a positive outcome). This is because fairness — in contrast to justice — is often conceptualized as a measure of equality of outcomes, rather than correctness of outcomes222Sidney Morgenbesser, following the Columbia University campus protests in the 1960s, reportedly said that the police had treated him unjustly, but not unfairly. He said that he was treated unjustly because the police hit him without provocation — but not unfairly, because the police were doing the same to everyone else as well.. In particular, it remains the job of the learning algorithm to optimize for correctness subject to elicited fairness constraints.

We remark that the premise (and the foundation for the enormous success) of machine learning is that accurate decision making rules in complex scenarios cannot be defined with simple analytic rules, and instead are best derived directly from data. Our work can be viewed similarly, as deriving fairness constraints from data elicited from experts and stakeholders. In this paper, we solve the computational, statistical, and conceptual issues necessary to do this, and demonstrate the effectiveness of our approach via a small behavioral study.

1.1 Results

Our Model

We model individuals as having features in $\mathcal{X}$ and binary labels, drawn from some distribution $\mathcal{P}$ . A committee of stakeholders333Though we develop our formalism as a committee of stakeholders, note that it permits the special case of a single subjective stakeholder, which we make use of in our behavioral study. $u\in\mathcal{U}$ has preferences about whether one individual should be judged better than another individual. We imagine presenting each stakeholder with a set of pairs of individuals and asking them to choose one of four options for each pair, e.g. given the features of Serena Williams and Jacob Blake:

No constraint; 2. 2.

Williams should be treated as well as Blake or better; 3. 3.

Blake should be treated as well as Williams or better; or 4. 4.

Williams and Blake should be treated similarly.

Here, when we refer to how an individual should be treated, we mean the probability that an individual is given a positive label by the classifier. This may be a bit of a relaxation of these judgments, since they are not about actualized classifications, but rather the probabilities of positive classification. For example, we may not consider it a violation of fairness preference (2) if Williams is judged worse than Blake in a specific scenario; yet, if an ump is more likely to judge Williams worse than Blake in general, then this would violate this fairness preference.

We represent these preferences abstractly as a set of ordered pairs $C_{u}\subseteq\mathcal{X}\times\mathcal{X}$ for each stakeholder $u$ . If $(x,x^{\prime})\in C_{u}$ , this means that stakeholder $u$ believes that individual $x^{\prime}$ must be treated as well as individual $x$ or better, i.e. ideally the classifier $h$ classifies such that $h(x^{\prime})\geq h(x)$ . This captures all possible responses above. For example, for Serena Williams $(s)$ and Jacob Blake $(b)$ , if stakeholder $u$ responds:

No constraint $\Leftrightarrow\,(s,b)\not\in C_{u}$ nor $(b,s)\not\in C_{u}$ ; 2. 2.

Williams as well as Blake $\Leftrightarrow\,(b,s)\in C_{u}$ ; 3. 3.

Blake as well as Williams $\Leftrightarrow\,(s,b)\in C_{u}$ ; or 4. 4.

Treated similarly $\Leftrightarrow\,(s,b)\in C_{u}$ and $(b,s)\in C_{u}$ (since if $h(b)\geq h(s)$ and $h(s)\geq h(b)$ , then $h(s)=h(b)$ ).

We impose no structure on how stakeholders form their views nor on the relationship between the views of different stakeholders — i.e. the sets $\{C_{u}\}_{u\in\mathcal{U}}$ are allowed to be arbitrary (for example, they need not satisfy a triangle inequality), and need not be mutually consistent. We write $C=\cup_{u}C_{u}$ .

We then formulate an optimization problem constrained by these pairwise fairness constraints. Since it is intractable to require that all constraints in $C$ be satisfied exactly, we formulate two different “knobs” with which we can quantitatively relax our fairness constraints.

For $\gamma>0$ (our first knob), we say that the classification of an ordered pair of individuals $(x,x^{\prime})\in C$ satisfies $\gamma$ -fairness if the probability of positive classification for $x^{\prime}$ plus $\gamma$ is no smaller than the probability of positive classification for $x$ , i.e. $\mathop{\mathbb{E}}[h(x^{\prime})]+\gamma\geq\mathop{\mathbb{E}}[h(x)]$ . In this expression, the expectation is taken only over the randomness of the classifier $h$ . Equivalently, a $\gamma$ -fairness violation corresponds to the classification of an ordered pair of individuals $(x,x^{\prime})\in C$ if the difference between these probabilities of positive classification is greater than $\gamma$ , i.e. $\mathop{\mathbb{E}}[h(x)-h(x^{\prime})]>\gamma$ . Thus, $\gamma$ acts as a buffer on how likely it is that $x^{\prime}$ be classified worse than $x$ before a fairness violation occurs. For example, if Blake $(b)$ receives a good label (i.e. no penalty) 80% of the time and Williams $(s)$ 50% of the time, then for $\gamma=0.1$ this constitutes a $\gamma$ -fairness violation for the ordered pair $(b,s)\in C$ , since $\mathop{\mathbb{E}}[h(b)-h(s)]=0.3\geq 0.1=\gamma$ .

We might ask that for no pair of individuals do we have a $\gamma$ -fairness violation: $\max_{(x,x^{\prime})\in C}\mathop{\mathbb{E}}[h(x)-h(x^{\prime})]\leq\gamma$ . On the other hand, we could ask for the weaker constraint that over a random draw of a pair of individuals, the expected fairness violation is at most $\eta$ (our second knob): $\mathop{\mathbb{E}}_{(x,x^{\prime})\sim\mathcal{P}^{2}}[(h(x)-h(x^{\prime}))\cdot\mathbf{1}[(x,x^{\prime})\in C]]\leq\eta$ . We can also combine both relaxations to ask that the in expectation over random pairs, the “excess” fairness violation, on top of an allowed budget of $\gamma$ , is at most $\eta$ . For example, as above, if Blake receives a good label 80% of the time and Williams 50%, for $\gamma=0.1,$ the umpire classifier would pick up $0.2$ excess fairness violation for $(b,s)\in C$ . In Section 2, we weight these excess fairness violations by the proportion of stakeholders who agree with the corresponding fairness constraint and mandate their sum be less than $\eta$ . Subject to these constraints, we would like to find the distribution over classifiers that minimizes classification error: given a setting of the parameters $\gamma$ and $\eta$ , this defines a benchmark with which we would like to compete.

Our Theoretical Results

Even absent fairness constraints, learning to minimize 0/1 loss (even over linear classifiers) is computationally hard in the worst case (see e.g. 10, 9). Despite this, learning seems to be empirically tractable in the real world. To capture the additional hardness of learning subject to fairness constraints, we follow several recent papers [1, 19] in aiming to develop oracle efficient learning algorithms. Oracle efficient algorithms are assumed to have access to an oracle (realized in experiments using a heuristic — see the next section) that can solve weighted classification problems. Given access to such an oracle, oracle efficient algorithms must run in polynomial time. We show that our fairness constrained learning problem is computationally no harder than unconstrained learning by giving such an oracle efficient algorithm (or reduction), and show moreover that its guarantees generalize from in-sample to out-of-sample in the usual way — with respect to both accuracy and the frequency and magnitude of fairness violations. Our algorithm is simple and amenable to implementation, and we use it in our experimental results.

Our Experimental Results

We implement our algorithm and run a set of experiments on the COMPAS recidivism prediction dataset, using fairness constraints elicited from 43 human subjects. We establish that our algorithm converges quickly (even when implemented with fast learning heuristics, rather than “oracles”). We also explore the Pareto curves trading off error and fairness violations for different human subjects, and find empirically that there is a great deal of variability across subjects in terms of their conception of fairness, and in terms of the degree to which their expressed preferences are in conflict with accurate prediction. We find that most of the difficulty in balancing accuracy with the elicited fairness constraints can be attributed to a small fraction of the constraints.

1.2 Related Work

Our work is related to existing notions of individual fairness like [7, 16] that conceptualize fairness as a set of constraints binding on pairs of individuals. In particular the notion of metric fairness proposed in [7] is closely related, but distinct from the fairness notions we elicit in this work. In particular: 1) We allow for constraints that require that individual $A$ be treated better than or equal to individual $B$ , whereas metric fairness constraints are symmetric, and only allow constraints of the form that $A$ and $B$ be treated similarly. In this sense our notion is more general. 2) We elicit binary judgements between pairs of individuals, whereas metric fairness is defined as a Lipschitz constraint on a real valued metric. In this sense our notion is more restrictive, although (we believe) easier to elicit.

The most technically related piece of work is Rothblum and Yona [33], who prove similar generalization guarantees to ours for a relaxation of metric fairness: our definition is slightly more general, and our generalization guarantee somewhat tighter, but technically the results are closely related. Our conceptual focus and main results are quite different, however: for general learning problems, they prove worst-case hardness results, whereas we derive practical algorithms in the oracle-efficient model, and empirically evaluate them on real user data. The concurrent work of Lahoti et al. [26] makes a similar observation about guaranteeing fairness with respect to an unknown metric, although their aim is the orthogonal goal of fair representation learning.

Dwork et al. [7] first proposed the notion of individual metric-fairness that we take inspiration from, imagining fairness as a Lipschitz constraint on a randomized algorithm, with respect to some “task-specific metric”. Since the original proposal, the question of where the metric should come from has been one of the primary obstacles to its adoption, and the focus of subsequent work. Zemel et al. [40] attempt to automatically learn a representation for the data (and hence, implicitly, a similarity metric) that causes a classifier to label an equal proportion of two protected groups as positive. Kim et al. [22] consider a group-fairness like relaxation of individual metric-fairness, asking that on average, individuals in pre-specified groups are classified with probabilities proportional to the average distance between individuals in those groups. They show how to learn such classifiers given access to an oracle which can evaluate the distance between two individuals according to the metric. Compared to our work, they assume the existence of a fairness metric which can be accessed using a quantitative oracle, and they use this metric to define a statistical rather than individual notion of fairness. Gillen et al. [13] assumes access to an oracle which simply identifies fairness violations across pairs of individuals. Under the assumption that the oracle is exactly consistent with a metric in a simple linear class, Gillen et al. [13] gives a polynomial time algorithm to compete with the best fair policy in an online linear contextual bandits problem. In contrast to Gillen et al. [13], we make essentially no assumptions at all on the structure of the “fairness” constraints. Ilvento [15] studies the problem of metric learning with the goal of using only a small number of numeric valued queries, which are hard for human beings to answer, relying more on comparison queries. In contrast with Ilvento [15], we do not attempt to learn a metric, and instead directly learn a classifier consistent with the elicited pairwise fairness constraints.

2 Problem Formulation

Let $S$ denote a set of labeled examples $\{z_{i}=(x_{i},y_{i})\}_{i=1}^{n},$ where $x_{i}\in\mathcal{X}$ is a feature vector and $y_{i}\in\mathcal{Y}$ is a label. We will also write $S_{X}=\{x_{i}\}_{i=1}^{n}$ and $S_{Y}=\{y_{i}\}_{i=1}^{n}$ . Throughout the paper, we will restrict attention to binary labels, so let $\mathcal{Y}=\{0,1\}$ . Let $\mathcal{P}$ denote the unknown distribution over $\mathcal{X}\times\mathcal{Y}$ . Let $\mathcal{H}$ denote a hypothesis class containing binary classifiers $h:\mathcal{X}\to\mathcal{Y}$ . We assume that $\mathcal{H}$ contains a constant classifier (which will imply that the “fairness constrained” ERM problem that we define is always feasible). We’ll denote classification error of hypothesis $h$ by $\textit{err}(h,\mathcal{P}):=\Pr_{(x,y)\sim\mathcal{P}}(h(x)\neq y)$ and its empirical classification error by $\textit{err}(h,S):=\frac{1}{n}\sum_{i=1}^{n}\mathbbm{1}(h(x_{i})\neq y_{i})$ .

We assume there is a set of one or more stakeholders $\mathcal{U}$ , such that each stakeholder $u\in\mathcal{U}$ is identified with a set of ordered pairs $(x,x^{\prime})$ of individuals $C_{u}\subseteq\mathcal{X}^{2}$ : for each $(x,x^{\prime})\in C_{u}$ , stakeholder $u$ thinks that $x^{\prime}$ should be treated as well as $x$ or better, i.e. ideally that for the learned classifier $h$ , the classification $h(x^{\prime})\geq h(x)$ (we will ask that this hold in expectation if the classifier is randomized, and will relax it in various ways). For each ordered pair $(x,x^{\prime})$ , let $w_{x,x^{\prime}}$ be the fraction of stakeholders who would like individual $x$ to be treated as well as $x^{\prime}$ : that is, $w_{x,x^{\prime}}=\frac{|\{u|(x,x^{\prime})\in C_{u}\}|}{|\mathcal{U}|}$ . Note that if $(x,x^{\prime})\in C_{u}$ and $(x^{\prime},x)\in C_{u}$ , then the stakeholder wants $x$ and $x^{\prime}$ to be treated similarly in that ideally $h(x)=h(x^{\prime})$ .

In practice, we will not have direct access to the sets of ordered pairs $C_{u}$ corresponding to the stakeholders $u$ , but we may ask them whether particular ordered pairs are in this set (see Section 5 for details about how we actually query human subjects). We model this by imagining that we present each stakeholder with a random set of pairs $A\subseteq[n]^{2}$ , and for each ordered pair $(x_{i},x_{j})$ , ask if $x_{j}$ should not be treated worse than $x_{i}$ ; we learn the set of ordered pairs in $A\cap C_{u}$ for each $u$ . Define the empirical constraint set $\hat{C}_{u}=\{(x_{i},x_{j})\in C_{u}\}_{\forall(i,j)\in A}$ and $\hat{w}_{x_{i}x_{j}}=\frac{|\{u|(x,x^{\prime})\in\hat{C}_{u}\}|}{|\mathcal{U}|}$ , if $(i,j)\in A$ and 0 otherwise. We write that $\hat{C}=\cup_{u}\hat{C}_{u}$ . For brevity, we will sometimes write $w_{ij}$ instead of $w_{x_{i},x_{j}}$ . Note that $\hat{w}_{ij}=w_{ij}$ for every $(i,j)\in A$ .

Our goal will be to find the distribution over classifiers from $\mathcal{H}$ that minimizes classification error, while satisfying the stakeholders’ fairness preferences, captured by the constraints $C$ . To do so, we’ll try to find $D$ , a probability distribution over $\mathcal{H}$ , that minimizes the training error and satisfies the stakeholders’ empirical fairness constraints, $\hat{C}$ . For convenience, we denote the expected classification error of $D$ as $err(D,\mathcal{P}):=\mathop{\mathbb{E}}_{h\sim D}[err(h,\mathcal{P})]$ and likewise its expected empirical classification error as $err(D,S):=\mathop{\mathbb{E}}_{h\sim D}[err(h,S)]$ . We say that any distribution $D$ over classifiers satisfies $(\gamma,\eta)$ -approximate subjective fairness if it is a feasible solution to the following constrained empirical risk minimization problem:

[TABLE]

This “Fair ERM” problem, whose feasible region we denote by $\Omega(S,\hat{w},\gamma,\eta)$ , has decision variables $D$ and $\{\alpha_{ij}\}$ , representing the distribution over classifiers and the “fairness violation” terms for each pair of training points, respectively. The parameters $\gamma$ and $\eta$ are constants which represent the two different “knobs” we have at our disposal to quantitatively relax the fairness constraint, in an $\ell_{\infty}$ and $\ell_{1}$ sense, respectively.

The parameter $\gamma$ defines, for any ordered pair $(x_{i},x_{j})$ , the maximum difference between the probabilities that $x_{i}$ and $x_{j}$ receive positive labels without constituting a fairness violation. The parameter $\alpha_{ij}$ captures the “excess fairness violation” beyond $\gamma$ for $(x_{i},x_{j})$ . The parameter $\eta$ upper bounds the sum of these allotted excess fairness violation terms $\alpha_{ij}$ , each weighted by the proportion of judges who perceive they ought to be treated similarly $\hat{w}_{ij}$ and normalized with the total number of pairs presented $|A|$ . Thus, $\eta$ bounds the expected degree of dissatisfaction of the panel of stakeholders $\mathcal{U}$ , over the random choice of an ordered pair $(x_{i},x_{j})\in A$ and the randomness of their classification. We iterate over all $(i,j)\in[n]^{2}$ (not just those in $\hat{C}$ ) because $\hat{w}_{ij}=0$ if no judge prefers $x_{i}$ should be classified as well as $x_{j}$ .

To better understand $\gamma$ and $\eta$ , we consider them in isolation. First, suppose we set $\gamma=0$ . Then, any difference in probabilities of positive classification between pairs is deemed a fairness violation. So, if we choose $(D,\{\alpha_{ij}\})$ such that the sum of weighted differences in positive classification probabilities exceeds $\eta$ , i.e.

[TABLE]

then this is an infeasible solution. For example, 50% of stakeholders think that Serena Williams $(s)$ should be treated as well as James Blake $(b)$ , 70% of stakeholders think Williams should be treated as well as John McEnroe (m), and no other constraints ( $|A|=6$ ); if Williams receives a good label 50% of the time, Blake 80%, McEnroe 90%, and $\eta=0.07$ , this is an $\eta$ -fairness violation, since

[TABLE]

Second, suppose that $\eta=0$ . Then, for any $(x_{i},x_{j})\in C$ (for which $\hat{w}_{ij}>0$ ), if the expected difference in labels exceeds $\gamma$ , i.e. $\mathop{\mathbb{E}}_{h\sim\,D}[h(x_{i})-h(x_{j})]>\gamma$ , then this is an infeasible solution.

2.1 Fairness Loss

Our goal is to develop an algorithm that will minimize its empirical error $err(D,S)$ , while satisfying the empirical fairness constraints $\hat{C}$ . The standard VC dimension argument states that empirical classification error will concentrate around the true classification error: we hope to show the same kind of generalization for fairness as well. To do so, we first define fairness loss with respect to our elicited fairness preferences here.

For some fixed randomized hypothesis $D\in\Delta\mathcal{H}$ and $w$ , define $\gamma$ -fairness loss between an ordered pair as

[TABLE]

For a set of pairs $M\subset\mathcal{X}\times\mathcal{X}$ , the $\gamma$ -fairness loss of $M$ is defined to be:

[TABLE]

This is the expected degree to which the difference in classification probability for a randomly selected pair exceeds the allowable budget $\gamma$ , weighted by the fraction of stakeholders who think that $x^{\prime}$ should be treated as well as $x$ . By construction, the empirical fairness loss is bounded by $\eta$ (i.e. $\Pi_{D,w,\gamma}(M)\leq\sum_{ij}\frac{\hat{w}_{ij}\alpha_{ij}}{|A|}\leq\eta$ ), and we show in Section 4, the empirical fairness should concentrate around the true fairness loss $\Pi_{D,w,\gamma}(\mathcal{P}):=\mathop{\mathbb{E}}_{x,x^{\prime}\sim\mathcal{P}^{2}}\left[\Pi_{D,w,\gamma}(x,x^{\prime})\right]$ .

2.2 Cost-sensitive Classification

In our algorithm, we will make use of a cost-sensitive classification (CSC) oracle. An instance of CSC problem can be described by a set of costs $\{(x_{i},c^{0}_{i},c^{1}_{i})\}_{i=1}^{n}$ and a hypothesis class, $\mathcal{H}$ . Costs $c^{0}_{i}$ and $c^{1}_{i}$ correspond to the cost of labeling $x_{i}$ as 0 and 1 respectively. Invoking a CSC oracle on $\{(x_{i},c^{0}_{i},c^{1}_{i})\}_{i=1}^{n}$ returns a hypothesis $h^{*}$ such that $h^{*}\in\operatorname*{\mathrm{argmin}}_{h\in\mathcal{H}}\sum_{i=1}^{n}\left(h(x_{i})c_{i}^{1}+\left(1-h(x_{i})\right)c_{i}^{0}\right)$ . We say that an algorithm is oracle-efficient if it runs in polynomial time assuming access to a CSC oracle.

3 Empirical Risk Minimization

In this section, we give an oracle-efficient algorithm 1 for approximately solving our (in-sample) constrained empirical risk minimization problem. Details are deferred to the supplement. We prove the following theorem:

Theorem 3.1.

Fix parameters $\nu,C_{\tau},C_{\lambda}$ that serve to trade off running time with approximation error. There is an efficient algorithm that makes $T=\left(\frac{2C_{\lambda}\sqrt{\log(n)}+C_{\tau}}{\nu}\right)^{2}$ CSC oracle calls and outputs a solution $(\hat{D},\hat{\alpha})$ with the following guarantee. The objective value is approximately optimal:

[TABLE]

And the constraints are approximately satisfied: $\mathop{\mathbb{E}}_{h\sim\hat{D}}[h(x_{i})-h(x_{j})]\leq\hat{\alpha}_{ij}+\gamma+\frac{1+2\nu}{C_{\lambda}},\forall(i,j)\in[n]^{2}$ and $\frac{1}{|A|}\sum_{(i,j)\in[n]^{2}}\hat{w}_{ij}\hat{\alpha}_{ij}\leq\eta+\frac{1+2\nu}{C_{\tau}}.$

3.1 Outline of the Solution

We frame the problem of solving our constrained ERM problem (equations (1) through (3)) as finding an approximate equilibrium of a zero-sum game between a primal player and a dual player, trying to minimize and maximize respectively the Lagrangian of the constrained optimization problem.

The Lagrangian for our optimization problem is

[TABLE]

For the constraint in equation (2), corresponding to the $\gamma$ -fairness violation for each ordered pair of individuals $(x_{i},x_{j})$ , we introduce a dual variable $\lambda_{ij}$ . For the constraint (3), which corresponds to the $\eta$ -fairness violation over all pairs of individuals, we introduce a dual variable of $\tau$ . For brevity, we define vectors $\lambda\in\Lambda$ and $\alpha$ which are made up of all the multipliers $\lambda_{ij}$ and the excess fairness violation allotments $\alpha_{ij}$ , respectively. The primary player’s action space is $(D,\alpha)\in(\Delta\mathcal{H},[0,1]^{n^{2}})$ , and the dual player’s action space is $(\lambda,\tau)\in(\mathbb{R}^{n^{2}},\mathbb{R})$ .

Solving our constrained ERM problem is equivalent to finding a minmax equilibrium of $\mathcal{L}$ :

[TABLE]

Because $\mathcal{L}$ is linear in terms of its parameters, Sion’s minimax theorem [34] gives us

[TABLE]

By a classic result of Freund and Schapire [12], one can compute an approximate equilibrium by simulating “no-regret” dynamics between the primal and dual player. “No-regret” meaning that the average regret –or difference between our algorithm’s plays and the single best play in hindsight– is bounded above by a term that converges to zero with increasing rounds.

In our case, we define a zero-sum game wherein the primary player’s plays from action space $(D,\alpha)\in(\Delta\mathcal{H},[0,1]^{n^{2}})$ , and the dual player’s plays from action space $(\lambda,\tau)\in(\mathbb{R}_{\geq 0}^{n^{2}},\mathbb{R}_{\geq 0})$ . In any given round $t$ , the dual player plays first and the primal second. The primal player can simply best respond to the dual player (see Algorithm 1).

However, since the dual player plays first, they cannot simply best respond to the primal player’s action. The dual player has to anticipate the primal player’s best response in order to figure out what to play. Ideally, the dual player would enumerate every possible primal play and calculate the best dual response. However, this is intractable. So, the dual player updates dual variables $\{\lambda,\tau\}$ according to no-regret learning algorithms (exponentiated gradient descent [23] and online gradient descent [41], respectively).

The time-averaged play of both players converges to an approximate equilibrium of the zero-sum game, where the approximation is controlled by the regret of the dual player. This approximate equilibrium corresponds to an approximate saddle point for the Lagrangian $\mathcal{L}$ , which is equivalent to an approximate solution to the Fair ERM problem.

We organize the rest of this section as follows. First, for simplicity, we show how the primal player updates $\{D,\alpha\}$ (even though the dual player plays first). Second, we show how the dual player updates $\{\lambda,\tau\}$ . Finally, we prove that these updates are no-regret and relate the regret of the dual player to the approximation of the solution to the Fair ERM problem.

3.2 The Primal Player’s Best Response

In each round $t$ , given the actions chosen by the dual player $(\lambda^{t},\tau^{t})$ , the primal player needs to best respond by choosing $(D^{t},\alpha^{t})$ such that $(D^{t},\alpha^{t})\in\operatorname*{\mathrm{argmin}}_{D\in\Delta\mathcal{H},\alpha\in[0,1]^{n^{2}}}\mathcal{L}(D,\alpha,\lambda^{t},\tau^{t}).$ In Lemma 3.2, we separate the optimization problem into two: one optimization over hypothesis $D$ and one over violation factor $\alpha$ . In Lemma 3.4, the primal player updates the hypothesis $D$ by leveraging a CSC oracle. Given $\lambda^{t}$ , we can set the costs as follows

[TABLE]

Then, $D^{t}=h^{t}=CSC\left(\{(x_{i},c^{0}_{i},c^{1}_{i})\}_{i=1}^{n}\right)$ (we note that the best response is always a deterministic classifier $h^{t}$ ).

As for $\alpha^{t}$ , we show in Lemma 3.3 that the primal player sets $\alpha^{t}_{ij}=1$ if $\tau^{t}\frac{w_{ij}}{|A|}-\lambda^{t}_{ij}\leq 0$ and 0 otherwise. We provide the pseudo-code in Algorithm 1.

Lemma 3.2.

For fixed $\lambda,\tau$ , the best response optimization for the primal player is separable, i.e.

[TABLE]

where

[TABLE]

and

[TABLE]

Lemma 3.3.

For fixed $\lambda$ and $\tau$ , the output $\alpha$ from $BEST_{\rho}(\lambda,\tau)$ minimizes $\mathcal{L}_{\lambda,\tau}^{\rho_{2}}$

Proof.

The optimization

[TABLE]

Note that for any pair $(i,j)\in[n]^{2}$ , the term $\alpha_{ij}\in[0,1]$ . Thus, when the constant $\tau\frac{w_{ij}}{|A|}-\lambda_{ij}\leq 0,$ we assign $\alpha_{ij}$ as the maximum bound, $1$ , in order to minimize $\mathcal{L}_{\rho_{2}}$ . Otherwise, when $\tau\frac{w_{ij}}{|A|}-\lambda_{ij}>0,$ we assign $\alpha_{ij}$ as the minimum bound, 0. ∎

Lemma 3.4.

For fixed $\lambda$ and $\tau$ , the output $D$ from $BEST_{\rho}(\lambda,\tau)$ minimizes $\mathcal{L}_{\lambda,\tau}^{\rho_{1}}$

Proof.

[TABLE]

For each $i\in[n],$ we assign the cost

[TABLE]

Note that the cost depends on whether $y_{i}=0$ or 1. For example, take $y_{i}=1$ and $h(x_{i})=0$ . The cost

[TABLE]

∎

3.3 The Dual Player’s No-regret Updates

In order to reason about convergence we need to restrict the dual player’s action space to lie within a bounded $\ell_{1}$ ball, defined by the parameters $C_{\tau}$ and $C_{\lambda}$ that appear in our theorem — and serve to trade off running time with approximation quality:

[TABLE]

The dual player will use exponentiated gradient descent [23] to update $\lambda$ and online gradient descent [41] to update $\tau$ , where the reward function will be defined as:

[TABLE]

and

[TABLE]

We provide the pseudo-code in Algorithm 2 but defer some of the proofs to the supplement.

Lemma 3.5.

For fixed $D$ and $\alpha$ , the best response optimization for the dual player is separable, i.e.

[TABLE]

where

[TABLE]

and

[TABLE]

Lemma 3.6.

Running online gradient descent for $\tau^{t}$ , i.e. $\tau^{t}=\textit{proj}_{[0,C_{\tau}]}\left(\tau^{t-1}+\mu^{t-1}\cdot\nabla\mathcal{L}_{D^{t},\alpha^{t}}^{\psi_{2}}\left(\tau^{t-1}\right)\right)$ , with step size $\mu^{t}=\frac{C_{\tau}}{\sqrt{T}}$ yields the following regret

[TABLE]

Proof.

First, note that $\nabla\mathcal{L}_{D^{t},\alpha^{t}}^{\psi_{2}}\left(\tau^{t-1}\right)=\frac{1}{W}\sum_{ij}w_{ij}\alpha_{ij}^{t-1}-\eta$ and

[TABLE]

From [41], we find that the regret of this online gradient descent (translated into the terms of our paper) is bounded as follows:

[TABLE]

where the bound on our target $\tau$ term is $C_{\tau}$ , the gradient of our cost function at round $t$ is $\nabla\mathcal{L}^{\psi_{2}}_{D^{t},\alpha^{t}}\left(\tau^{t-1}\right)$ , and the bound $\left|\left|\nabla\mathcal{L}_{D,\alpha}^{\psi_{2}}\right|\right|=\sup_{\tau\in\mathcal{T},\ t\in{[T]}}\left|\left|\nabla\mathcal{L}_{D^{t},\alpha^{t}}^{\psi_{2}}\left(\tau^{t-1}\right)\right|\right|.$ To prove the above lemma, we first need to show that this bound $\left|\left|\nabla\mathcal{L}_{D,\alpha}^{\psi_{2}}\right|\right|\leq 1.$

Since $w_{ij},\alpha_{ij},\eta\in[0,1]$ for all pairs $(i,j)$ , the Lagrangian $\frac{1}{|A|}\sum_{ij}w_{ij}\alpha_{ij}-\eta=\frac{\sum_{ij}w_{ij}\alpha_{ij}}{|A|}-\eta\leq 1.$ For all $t$ , the gradient

[TABLE]

Thus,

[TABLE]

Note that if we define $\mu_{\tau}^{t}=\frac{C_{\tau}}{\sqrt{T}},$ then the summation of the step sizes is equal to

[TABLE]

Substituting these two results into inequality (4), we get that the regret

[TABLE]

∎

Lemma 3.7.

Running exponentiated gradient descent for $\lambda^{t}$ yields the following regret:

[TABLE]

Proof.

In each round, the dual player gets to charge either some $(i,j)$ constraint or no constraint at all. In other words, he is presented with $n^{2}+1$ options. Therefore, to account for the option of not charging any constraint, we define vector $\lambda^{\prime}=\left(\lambda,0\right)$ , where the last coordinate, which will always be [math], corresponds to the option of not charging any constraint.

Next, we define the reward vector $\zeta^{t}$ for $\lambda^{\prime t}$ as

[TABLE]

Hence, the reward function is

[TABLE]

The gradient of the reward function is

[TABLE]

Note that the L- $\infty$ norm of the gradient is bounded by 1, i.e.

[TABLE]

because for any $t$ , each respective component of the gradient, $\mathop{\mathbb{E}}\limits_{h\sim D^{t}}\left[h(x_{i})-h(x_{j})\right]-\alpha^{t}_{ij}-\gamma$ , is bounded by 1.

Here, by the regret bound of [23], we obtain the following regret bound:

[TABLE]

If we take $\mu=\frac{1}{C_{\lambda}}\sqrt{\frac{\log n}{T}},$ the regret is bounded as follows:

[TABLE]

∎

Remark 3.8.

If the primal learner’s approximate best response satisfies

[TABLE]

along with dual player’s regret of $\xi_{\rho}T$ , then $\left(\bar{D},\bar{\alpha},\bar{\lambda},\bar{\tau}\right)$ is an $\left(\xi_{\rho}+\xi_{\psi}\right)$ -approximate solution

Theorem 3.9.

Let $\left(\hat{D},\hat{\alpha},\hat{\lambda},\hat{\tau}\right)$ be a $v$ -approximate solution to the Lagrangian problem. More specifically,

[TABLE]

and

[TABLE]

Then, $err\left(\hat{D},S\right)\leq OPT+2v$ . And as for the constraints, we have

[TABLE]

Proof.

Let $(D^{*},\alpha^{*})=\operatorname*{\mathrm{argmin}}_{(D,\alpha)\in\Omega(S,\hat{w},\gamma,\eta)}\textit{err}(D,S)$ , the optimal solution to the Fair ERM. Also, define

[TABLE]

Note that for any $D$ and $\alpha$ , $\max_{\lambda\in\Lambda,\tau\in\mathcal{T}}penalty_{S,\hat{w}}(D,\alpha,\lambda,\tau)\geq 0$ because one can always set $\lambda=0$ and $\tau=0$ .

[TABLE]

The first inequality and the third inequality are from the definition of $v$ -approximate saddle point, and the second to last equality comes from the fact that $(D^{*},a^{*})$ is a feasible solution.

Now, we consider two cases when $(\hat{D},\hat{\alpha})$ is a feasible solution and when it’s not.

$\left(\hat{D},\hat{\alpha}\right)\in\Omega\left(S,\hat{w},\gamma,\eta\right)$

In this case, $\max_{\lambda\in\Lambda,\tau\in\mathcal{T}}penalty_{S,\hat{w}}\left(\hat{D},\hat{\alpha},\lambda,\tau\right)=0$ because by the definition of being a feasible solution, we have $\mathop{\mathbb{E}}_{h\sim D}\left[h(x_{i})-h(x_{j})\right]\leq\alpha_{ij}+\gamma,\forall(i,j)\in[n]^{2}$ and

$\frac{1}{|A|}\sum_{(i,j)\in[n]^{2}}\hat{w}_{ij}\alpha_{ij}\leq\eta.$ Hence, $\max_{\lambda\in\Lambda,\tau\in\mathcal{T}}\mathcal{L}\left(\hat{D},\hat{\alpha},\lambda,\tau\right)=err\left(\hat{D},S\right)$ . Therefore, we have $err\left(\hat{D},S\right)\leq err\left(D^{*},S\right)+2v$ . 2. 2.

$\left(\hat{D},\hat{\alpha}\right)\notin\Omega\left(S,\hat{w},\gamma,\eta\right)$

[TABLE]

Therefore, $err\left(\hat{D},S\right)\leq err\left(D^{*},S\right)+2v$ because

[TABLE]

Now, we show that even when $(\hat{D},\hat{\alpha})$ is not a feasible solution, the constraints are violated only by so much. Note that

[TABLE]

Therefore,

[TABLE]

Let $\lambda^{*},\tau^{*}=BEST_{\psi}\left(\hat{D},\hat{\alpha}\right)$ , which minimizes the function as shown in Lemma A.3 and A.4. Now, consider

[TABLE]

Say $(i^{*},j^{*})=\operatorname*{\mathrm{argmax}}_{(i,j)\in[n^{2}]}\mathop{\mathbb{E}}\limits_{h\sim D}\left[h(x_{i})-h(x_{j})\right]-\alpha_{ij}-\gamma$ . Remember that if $\mathop{\mathbb{E}}\limits_{h\sim D}\left[h(x_{i^{*}})-h(x_{j^{*}})\right]-\alpha_{i^{*}j^{*}}-\gamma>0$ , then $\lambda^{*}_{i^{*}j^{*}}=C_{\tau}$ and 0 for the other coordinates and else, it’s just a zero vector. Also, $\tau=C_{\tau}$ if $\sum_{(i,j)}\hat{w}_{ij}\alpha_{ij}-\eta>0$ and 0 otherwise. Thus,

[TABLE]

Therefore, we have

[TABLE]

and

[TABLE]

∎

Now, the proof of Theorem 3.1 is simply plugging in the best response guarantee of the learner, Lemma 3.3 and 3.4, and the no-regret guarantee of the auditor, Lemma 3.6 and 3.7, into Theorem 3.9. We defer the actual proof to the supplement.

4 Generalization

In this section, we show that fairness loss generalizes out-of-sample. (Error generalization follows from the standard VC-dimension bound, which — because it is a uniform convergece statement is unaffected by the addition of fairness constraints. See the supplement for the standard statement.)

Proving that the fairness loss generalizes doesn’t follow immediately from a standard VC-dimension argument for several reasons: it is not linearly separable, but defined as an average over non-disjoint pairs of individuals in the sample. The difference between empirical fairness loss and true fairness loss of a randomized hypothesis $D\in\Delta\mathcal{H}$ is also a non-convex function of the supporting hypotheses $h$ , and so it is not sufficient to prove a uniform convergence bound merely for the base hypotheses in our hypothesis class $\mathcal{H}$ . We circumvent these difficulties by making use of an $\varepsilon$ -net argument, together with an application of a concentration inequality, and an application of Sauer’s lemma. Briefly, we show that with respect to fairness loss, the continuous set of distributions over classifiers have an $\varepsilon$ -net of sparse distributions. Using the two-sample trick and Sauer’s lemma, we can bound the number of such sparse distributions. The end result is the following generalization theorem:

Theorem 4.1.

Let $S$ consists of $n$ i.i.d points drawn from $\mathcal{P}$ and let $M$ represent a set of $m$ pairs randomly drawn from $S\times S$ . Then we have:

[TABLE]

where $k^{\prime}=\frac{2\ln(2m)}{\varepsilon^{2}}+1$ , $k=\frac{\ln(2n^{2})}{8\varepsilon^{2}}+1$ , and $d$ is the VC-dimension of $\mathcal{H}$ .

To interpret this theorem, note that the right hand side (the probability of a failure of generalization) begins decreasing exponentially fast in the data and fairness constraint sample parameters $n$ and $m$ as soon as $n\geq\Omega(d\log(n)\log(n/d))$ and $m\geq\Omega(d\log(m)\log(n/d))$ .

5 A Behavioral Study

The framework and algorithm we have provided can be viewed as a tool to elicit and enforce a notion of fairness defined by a collection of stakeholders. In this section, we describe preliminary results from a human-subject study we performed in which pairwise fairness preferences were elicited and enforced by our algorithm.

We note that the subjects included in our empirical study were not stakeholders affected by the algorithm we used (the COMPAS algorithm). Thus, our results should not be interpreted as cogent for any policy modifications to the COMPAS algorithm. We instead report our empirical findings primarily to showcase the performance of our algorithm and to act as a template for what should be reported if our framework were applied with relevant stakeholders (for example, if fairness preferences about COMPAS data were elicited from inmates).444We omit such an empirical study due to the difficulty of accessing such stakeholders and leave this for future work.

5.1 Data

Our study used the COMPAS recidivism data gathered by ProPublica 555 The data can be accessed on ProPublica’s Github page here. We cleaned the data as in the ProPublica study, removing any records with missing data. This left 5829 records, where the base rate of two-year recidivism was $46\%$ .

in their celebrated analysis of Northepointe’s risk assessment algorithm [27]. This data consists of defendants from Broward County in Florida between 2013 to 2014. For each defendant the data consists of sex (male, female), age (18-96), race (African-American, Caucasian, Hispanic, Asian, Native American), juvenile felony count, juvenile misdemeanor count, number of other juvenile offenses, number of prior adult criminal offenses, the severity of the crime for which they were incarcerated (felony or misdemeanor), as well as the outcome of whether or not they did in fact recidivate. Recidivism is defined as a new arrest within 2 years, not counting traffic violations and municipal ordinance violations.

5.2 Subjective Fairness Elicitation

We implemented our fairness framework via a web app that elicited subjective fairness notions from 43 undergraduates at a major research university. After reading a document describing the data and recidivism prediction task, each subject was presented with 50 randomly chosen pairs of records from the COMPAS data set and asked whether in their opinion the two individuals should treated (predicted) equally or not. Importantly, the subjects were shown only the features for the individuals, and not their actual recidivism outcomes, since we sought to elicit subjects’ fairness notions regarding the predictions of those outcomes. While absolutely no guidance was given to subjects regarding fairness, the elicitation framework allows for rich possibilities. For example, subjects could choose to ignore demographic factors or criminal histories entirely if they liked, or a subject who believes that minorities are more vulnerable to overpolicing could discount their criminal histories relative to Caucasians in their pairwise elicitations.

For each subject, the pairs they identified to be treated equally were taken as constraints on error minimization with respect to the actual recidivism outcomes over the entire COMPAS dataset, and our algorithm was applied to solve this constrained optimization problem, using a linear threshold heuristic as the underlying learning oracle [19]. We ran our algorithm with $\eta=0$ and variable $\gamma$ in Equations (1) through (3), which represents the strongest enforcement of subjective fairness — the difference in predicted values must be at most $\gamma$ on every pair selected by a subject. Because the issues we are most interested in here (convergence, tradeoffs with accuracy, and heterogeneity of fairness preferences) are orthogonal to generalization — and because we prove VC-dimension based generalization theorems — for simplicity, the results we report are in-sample.

5.3 Results

Since our algorithm relies on a learning heuristic for which worst-case guarantees are not possible, the first empirical question is whether the algorithm converges rapidly on the behavioral data. We found that it did so consistently; a typical example is Figure 2(a), where we show the trajectories of model error vs. fairness violation for a particular subject’s data for variable values of the input $\gamma$ (horizontal lines). After 1000 iterations, the algorithm has converged to the optimal errors subject to the allowed $\gamma$ .

Perhaps the most basic behavioral questions we might ask involve the extent and nature of subject variability. For example, do some subjects identify constraint pairs that are much harder to satisfy than other subjects? And if so, what factors seem to account for such variation?

Figure 2(b) shows that there is indeed considerable variation in subject difficulty. For each of the 43 subjects, we have plotted the error vs. fairness violation Pareto curves obtained by varying $\gamma$ from 0 (pairs selected by subjects must have identical probabilistic predictions of recidivism) to 1.0 (no fairness enforced whatsoever). Since our model space is closed under probabilistic mixtures, the worst-case Pareto curve is linear, obtained by all mixtures of the error-optimal model and random predictions. Easier constaint sets are more convex. We see in the figure that both extremes are exhibited behaviorally — some subjects yield linear or near-linear curves, while others permit huge reductions in unfairness for only slight increases in error, and virtually all the possibilities in between are realized as well. 666The slight deviations from true convexity are due to approximate rather than exact convergence.

Since each subject was presented with 50 random pairs and was free to constrain as many or as few as they wished, it is natural to wonder if the variation in difficulty is explained simply by the number of constraints chosen. In Figure 2(c) we show a scatterplot of the the number of constraints selected by a subject ( $x$ axis) versus the error obtained ( $y$ axis) for $\gamma=0.3$ (an intermediate value that exhibits considerable variation in subject error rates) for all 43 subjects. While we see there is indeed strong correlation (approximately 0.69), it is far from the case that the number of constraints explains all the variability. For example, amongst subjects who selected approximately 16 constraints, the resulting error varies over a range of nearly 8%, which is over 40% of the range from the optimal error (0.32) to the worst fairness-constrained error (0.5). More surprisingly, when we consider only the ‘opposing’ constraints, pairs of points with different true labels, the correlation (0.489) seems to be weaker. Enforcing a classifier to predict similarly on a pair of points with different true labels should increase the error, and yet, it is less correlated with error than the raw number of constraints. This suggests that the variability in subject difficulty is due to the nature of the constraints themselves rather than their number or disagreement with the true labels.

It is also interesting to consider the collective force of the 1432 constraints selected by all 43 subjects together, which we can view as a “fairness panel” of sorts. Given that there are already individual subjects whose constraints yield the worst-case Pareto curve, it is unsurprising that the collective constraints do as well. But we can exploit the flexibility of our optimization framework in Equations (1) through constraint (3), and let $\gamma=0.0$ and vary only $\eta$ , thus giving the learner discretion in which subjects’ constraints to discount or discard at a given budget $\eta$ . In doing so we find that the unconstrained optimal error can be obtained while having the average (exact) pairwise constraint be violated by only roughly 25%, meaning roughly that only 25% of the collective constraints account for all the difficulty.

Finally, we can investigate the extent to which behavioral subjective fairness notions align with more standard statistical fairness definitions, such as equality of false positive rates. For instance, for each subject and a pair of racial groups, we take the absolute difference in false positive rates of the classifier at $\gamma\in\{0.0,0.1,\dots,1.0\}$ and calculate the correlation coefficient between realized values of $\gamma$ (which measure violation of subjective unfairness) and the false positive rate differences. Figure 2(e) shows the average correlation coefficient across subjects for each pair of racial groups. We note that subjective fairness correlates with a smaller gap between the false positive rates across Caucasians and African Americans: but correlates substantially less for other pairs of racial groups.

We leave a more complete investigation of our behavioral study for future work, including the detailed nature of subject variability and further comparison of behavioral subjective fairness to standard algorithmic fairness notions.

Acknowledgements

AR is supported in part by NSF grants AF-1763307, CNS-1253345, and an Amazon Research Award. ZSW is supported in part by an NSF grant FAI-1939606, a Google Faculty Research Award, a J.P. Morgan Faculty Award, and a Facebook Research Award. Part of this work was completed while ZSW was visiting the Simons Institute for the Theory of Computing at UC Berkeley.

Appendix A Omitted details in Section 3

A.1 Primal player’s best response

Lemma A.1 (Restatement of Lemma 3.2).

For fixed $\lambda,\tau$ , the best response optimization for the primal player is separable, i.e.

[TABLE]

where

[TABLE]

and

[TABLE]

Proof.

First, note that $\alpha$ is not dependent on $D$ and vice versa. Thus, we may separate the optimization $\operatorname*{\mathrm{argmin}}_{D,\alpha}\mathcal{L}$ as such:

[TABLE]

∎

A.2 Dual player’s best response

Lemma A.2 (Restatement of Lemma 3.5).

For fixed $D$ and $\alpha$ , the best response optimization for the dual player is separable, i.e.

[TABLE]

where

[TABLE]

and

[TABLE]

Proof.

[TABLE]

∎

Lemma A.3.

For fixed $D$ and $\alpha$ , the output $\lambda$ from $BEST_{\psi}(D,\alpha)$ minimizes $\mathcal{L}_{D,\alpha}^{\psi_{1}}$

Proof.

Because $\mathcal{L}_{D,\alpha}^{\psi_{1}}$ is linear in terms of $\lambda$ and the feasible region is the non-negative orthant bounded by 1-norm, the optimal solution must include putting all the weight to the pair $(i,j)$ where $\mathop{\mathbb{E}}_{h\sim D}[h(x_{i})-h(x_{j})-\alpha_{ij}]$ is maximized. ∎

Lemma A.4.

For fixed $D$ and $\alpha$ , the output $\tau$ from $BEST_{\psi}(D,\alpha)$ minimizes $\mathcal{L}_{D,\alpha}^{\psi_{2}}$

Proof.

Because $\mathcal{L}_{D,\alpha}^{\psi_{2}}$ is linear in terms of $\tau$ , the optimal solution is trivially to set $\tau$ at either $C_{\tau}$ or 0 depending on the sign. ∎

A.3 No-regret dynamics

Theorem A.5 ([12]).

Let $(D^{1},\alpha^{1}),\dots,(D^{T},\alpha^{T})$ be the primal player’s sequence of actions, and $(\lambda^{1},\tau^{1}),\dots,(\lambda^{T},\tau^{T})$ be the dual player’s sequence of actions. Let $\bar{D}=\frac{1}{T}\sum_{t=1}^{T}D^{t}$ , $\bar{\alpha}=\frac{1}{T}\sum_{t=1}^{T}\alpha^{t}$ , $\bar{\lambda}=\frac{1}{T}\sum_{t=1}^{T}\lambda^{t}$ , and $\bar{\tau}=\frac{1}{T}\sum_{t=1}^{T}\tau^{t}$ . Then, if the regret of the dual player satisfies

[TABLE]

and the primal player best responds in each round ( $D^{t},\alpha^{t}=\operatorname*{\mathrm{argmax}}_{D\in\Delta(H),\alpha\in[0,1]^{n^{2}}}\mathcal{L}\left(D,\alpha,\lambda^{t},\tau^{t}\right)$ ), then $(\bar{D},\bar{\alpha},\bar{\lambda},\bar{\tau})$ is an $\xi_{\psi}$ -approximate solution

A.3.1 Omitted proof of theorem 3.1

proof of theorem 3.1.

Observe that

[TABLE]

By how we constructed $\mathcal{L}_{D,\alpha}^{\psi_{1}}$ and $\mathcal{L}_{D,\alpha}^{\psi_{2}}$ , combining Lemma 3.6 and 3.7 yields

[TABLE]

where $\xi_{\psi}=\frac{2C_{\lambda}\sqrt{T\log n}+C_{\tau}\sqrt{T}}{T}$ .

Then, theorem A.5 tells us that $\bar{D},\bar{\alpha},\bar{\lambda},\bar{\alpha}$ form a $\xi_{\psi}$ -approximate equilibrium, where $\bar{D}=\frac{1}{T}\sum_{t=1}^{T}D^{t}$ , $\bar{\alpha}=\frac{1}{T}\sum_{t=1}^{T}\alpha^{t}$ , $\bar{\lambda}=\frac{1}{T}\sum_{t=1}^{T}\lambda^{t}$ , and $\bar{\tau}=\frac{1}{T}\sum_{t=1}^{T}\tau^{t}$ . And finally, with $T=\left(\frac{2C_{\lambda}\sqrt{\log(n)}+C_{\tau}}{v}\right)^{2}$ results in $\xi_{\psi}=\nu$ , theorem 3.9 gives

[TABLE]

And as for the constraints,

[TABLE]

and

[TABLE]

∎

Appendix B Generalization

B.0.1 Error

Theorem B.1 ([20]).

Fix some hypothesis class $\mathcal{H}$ and distribution $\mathcal{P}$ . Let $S\sim P^{n}$ be a dataset consisting of $n$ examples $\{x_{i},y_{i}\}_{i=1}^{n}$ sampled i.i.d. from $\mathcal{P}$ . Then, for any $0<\delta<1$ , with probability $1-\delta$ , for every $h\in\mathcal{H}$ , we have

[TABLE]

B.0.2 Fairness Loss

At a high level, our argument proceeds as follows: using McDiarmid’s inequality, for any fixed hypothesis, its empirical fairness loss concentrates around its expectation. This argument extends to an infinite family of hypotheses with bounded VC-dimension via the standard two-sample trick, together with Sauer’s lemma: the only catch is that we need to use a variant of McDiarmid’s inequality that applies to sampling without replacement. However, proving that the fairness loss for each fixed hypothesis $h$ concentrates around its expectation is not sufficient to obtain the same result for arbitrary distributions over hypotheses, because the difference between a randomized classifier’s fairness loss and its expectation is a non-convex function of the mixture weights. To circumvent this issue, we show that with respect to fairness loss, there is an $\varepsilon$ -net consisting of sparse distributions over hypotheses. Once we apply Sauer’s lemma and the two-sample trick, there are only finitely many such distributions, and we can union bound over them.

We begin by stating the standard version of McDiarmid’s inequality:

Theorem B.2 (McDiarmid’s Inequality).

Suppose $X_{1},\dots,X_{n}$ are independent and $f$ satisfies

[TABLE]

Then, for any $\varepsilon>0$ ,

[TABLE]

Lemma B.3.

Fix a randomized hypothesis $D\in\Delta\mathcal{H}$ . Over the randomness of $S\sim\mathcal{P}^{n}$ , we have

[TABLE]

Proof.

Define a slightly modified fairness loss function that depends on each instance instead of a pair.

[TABLE]

Note that $\Pi^{\prime}_{D,w,\gamma}(x_{1},\dots,x_{n})=\Pi_{D,w,\gamma}(S\times S)$ . The sensitivity of $\Pi^{\prime}_{D,w,\gamma}(x_{1},x_{2},\dots,x_{n})$ is $\frac{1}{n}$ , so applying McDiarmid’s inequality yields the above concentration. ∎

Theorem B.4.

If $n\geq\frac{2\ln(2)}{\varepsilon^{2}}$ ,

[TABLE]

where $d$ is the VC-dimension of $\mathcal{H}$ , and $k=\frac{\ln(2n^{2})}{8\varepsilon^{2}}+1$ .

Proof.

First, by linearity of expectation, we note that $\mathop{\mathbb{E}}_{S}\left[\Pi_{D,w,\gamma}(S\times S)\right]=\mathop{\mathbb{E}}_{x,x^{\prime}}\left[\Pi_{D,w,\gamma}(x,x^{\prime})\right]$ . Given $S$ , let $D^{*}_{S}$ be some randomized classifier such that $\left|\Pi_{D^{*}_{S},w,\gamma}(S\times S)-\mathop{\mathbb{E}}_{x,x^{\prime}}\left[\Pi_{D^{*}_{S},w,\gamma}(x,x^{\prime})\right]\right|>\varepsilon$ ; if such hypothesis does not exist, let it be some fixed hypothesis in $\mathcal{H}$ . We now use standard symmetrization argument, which allows us to bound the difference between the fairness loss of our sample $S$ and that of another independent ‘ghost’ sample $S^{\prime}=(x^{\prime}_{1},\dots,x^{\prime}_{n})$ instead of bounding the difference between the empirical fairness loss and its expected fairness loss.

[TABLE]

We used Lemma B.3 for the second to last inequality, and the last inequality follows from the theorem’s condition and the definition of $D^{*}_{S}$ .

Now, imagine sampling $\bar{S}=2n$ points from $\mathcal{P}$ , and uniformly choosing $n$ points without replacement to be $S$ and the remaining $n$ points to be $S^{\prime}$ . This process is equivalent to sampling $n$ points from $\mathcal{P}$ to form $S$ and another independent set of $n$ points from $\mathcal{P}$ to form $S^{\prime}$ .

[TABLE]

Now, instead of bounding the supremum over $\Delta\mathcal{H}$ , we pay approximation error of $\varepsilon^{\prime}$ in order to bound the supremum over $\mathcal{H}$ .

Lemma B.5.

For some fixed data sample $S$ of size $n$ , any $D\in\Delta\mathcal{H}$ can be approximated by some uniform mixture over $k:=\frac{2\ln(2n^{2})}{\varepsilon^{\prime 2}}+1$ hypotheses $\hat{D}=\frac{1}{k}\{h_{1},\dots,h_{k}\}$ such that for every $(x,x^{\prime})\in S\times S$ ,

[TABLE]

Proof.

Fix some $(x,x^{\prime})\in S\times S$ . Randomly sample $k$ hypotheses from $D$ : $\{h_{i}\}_{i=1}^{k}\sim D^{k}$ . Because for each randomly drawn hypothesis $h_{i}\sim D$ , the difference in its prediction for $x$ and $x^{\prime}$ is exactly $\mathop{\mathbb{E}}_{h\sim D}[h(x)-h(x^{\prime})]$ , Hoeffding’s inequality yields that

[TABLE]

However, there are $n^{2}$ fixed pairs in $S\times S$ , and if we distribute the failure property between $n^{2}$ pairs and union bound over all of them, we get

[TABLE]

In order to achieve non-zero probability of having

[TABLE]

we need to make sure $2n^{2}\exp\left(-\frac{k\varepsilon^{\prime 2}}{2}\right)<1$ or $k>\frac{2\ln\left(2n^{2}\right)}{\varepsilon^{\prime 2}}$ .

∎

Corollary B.6.

For some fixed data sample $S$ , any $D\in\Delta\mathcal{H}$ can be approximated by a uniform mixture of $k:=\frac{2\ln(2n^{2})}{\varepsilon^{\prime 2}}+1$ hypotheses $\hat{D}=\frac{1}{k}\{h_{1},\dots,h_{k}\}$ such that

[TABLE]

Proof.

It simply follows from Lemma B.5 and the fact that $\max\left(0,\mathop{\mathbb{E}}_{h\sim D}\left[h(x_{i})-h(x_{j})\right]-\gamma\right)$ is 1-Lipschitz in terms of $\mathop{\mathbb{E}}_{h\sim D}[h(x_{i})-h(x_{j})]$ . ∎

Using Corollary B.6 and using Sauer’s lemma that bounds the total number of possible labelings by $\mathcal{H}$ over $2n$ points to be $\left(\frac{e\cdot 2n}{d}\right)^{d}$ , we can show

[TABLE]

Now, for any $\hat{D}$ , we will try to bound the probability that the difference in fairness loss between $S$ and $S^{\prime}$ is big. We do so by union bounding over cases where both of them deviate from its mean by too much.

If $\left|\Pi_{\hat{D},w,\gamma}(S\times S)-E_{S|\bar{S}}\left[\Pi_{\hat{D},w,\gamma}(S\times S)\right]\right|\leq\frac{\varepsilon}{4}+\frac{\varepsilon^{\prime}}{2}$ and $\left|\Pi_{\hat{D},w,\gamma}(S^{\prime}\times S^{\prime})-E_{S|\bar{S}}\left[\Pi_{\hat{D},w,\gamma}(S\times S)\right]\right|\leq\frac{\varepsilon}{4}+\frac{\varepsilon^{\prime}}{2}$ , then $\left|\Pi_{\hat{D},w,\gamma}(S\times S)-\Pi_{\hat{D},w,\gamma}(S^{\prime}\times S^{\prime})\right|\leq\frac{\varepsilon}{2}+\varepsilon^{\prime}$ . In other words,

[TABLE]

Therefore, by looking at the compliment probabilities, we have

[TABLE]

Here, we can’t appeal to McDiarmid’s because $S$ is sampled without replacement from $\bar{S}$ . However, we can use the same technique that [30] leveraged – stochastic covering property can be used to show concentration for sampling without replacement [32].

Definition B.7 ([32]).

$Z_{1},\dots,Z_{n}$ * satisfy the stochastic covering property, if for any $I\subset[n]$ and $a\geq a^{\prime}\in\{0,1\}^{I}$ coordinate-wise such that $||a^{\prime}-a||_{1}=1$ , there is a coupling $\nu$ of the distributions $\mu,\mu^{\prime}$ of $(Z_{j}:j\in[n]\setminus I)$ conditioned on $Z_{I}=a$ or $Z_{I}=a^{\prime}$ , respectively, such that $\nu(x,y)=0$ unless $x\leq y$ coordinate-wise and $||x-y||_{1}\leq 1$ .*

Theorem B.8 ([32]).

Let $(Z_{1},\dots,Z_{n})\in\{0,1\}$ be random variables such that $\Pr(\sum_{i=1}^{n}Z_{i}=k)=1$ and the stochastic covering property is satisfied. Let $f:\{0,1\}^{n}\to\mathbb{R}$ be an $c$ -Lipschitz function. Then, for any $\varepsilon>0$ ,

[TABLE]

Lemma B.9 ([30]).

Given a set $S$ of $n$ points, sample $k\leq n$ elements without replacement. Let $Z_{i}=\{0,1\}$ indicate whether $i$ th element has been chosen. Then, $(Z_{1},\dots,Z_{n})$ satisfy the stochastic covering property.

Let $\bar{S}=\{x_{1},\dots,x_{2n}\}$ . If we slightly change the definition of the fairness loss so that it depends on the indicator variables $Z_{1},\dots,Z_{2n}$ ,

[TABLE]

We see that $\Pi^{\prime\prime}_{\hat{D},w,\gamma,\bar{S}}$ is $\frac{1}{n}$ -Lipschitz, so by theorem B.8 and lemma B.9, we get

[TABLE]

Combining everything, we get

[TABLE]

For convenience, we set $\varepsilon^{\prime}=\frac{\varepsilon}{2}$ .

∎

However, in our case, instead of finding the average over all pairs in $S$ , we calculate the fairness loss only over $m$ pairs. Fixing $S$ , if $m$ is sufficiently large, our empirical fairness loss should concentrate around the fairness loss over all the pairs for $S$ .

Lemma B.10.

For fixed $S$ , randomly chosen pairs $M\subset S\times S$ , and randomized hypothesis $D$ ,

[TABLE]

Proof.

Write a random variable $L_{a}=\Pi_{D,w,\gamma}((x_{2a-1},x_{2a}))$ for the fairness loss of the $a$ th pair. Note that

[TABLE]

Therefore, by Hoeffding’s inequality, we have

[TABLE]

∎

Lemma B.11.

For fixed $S$ and randomly chosen pairs $M\subset S\times S$ ,

[TABLE]

where $k^{\prime}=\frac{2\ln(2m)}{\varepsilon^{2}}+1$ .

Proof.

[TABLE]

where $k=\frac{2\ln(2m)}{4\varepsilon^{\prime 2}}+1$ . The last inequality is from Corollary B.6 and Lemma B.10. For convenience, we just set $\varepsilon^{\prime}=\varepsilon/2$ . ∎

B.1 Omitted proof of theorem 4.1

Combining theorem B.4 and lemma B.11 yields the following theorem for fairness loss generalization.

proof of theorem 4.1.

With probability $1-\left(8\cdot(\frac{e\cdot 2n}{d})^{dk}\exp\left(\frac{-n\varepsilon^{2}}{32}\right)+\left(\frac{e\cdot 2n}{d}\right)^{dk^{\prime}}\exp\left(-8m\varepsilon^{2}\right)\right)$ , where $k^{\prime}=\frac{2\ln(2m)}{\varepsilon^{2}}+1$ and $k=\frac{\ln(2n^{2})}{8\varepsilon^{2}}+1$ , we have

[TABLE]

and

[TABLE]

Then, by triangle inequality,

[TABLE]

In other words, with probability $\left(8\cdot\left(\frac{e\cdot 2n}{d}\right)^{dk}\exp\left(\frac{-n\varepsilon^{2}}{32}\right)+\left(\frac{e\cdot 2n}{d}\right)^{dk^{\prime}}\exp\left(-8m\varepsilon^{2}\right)\right)$ , we have

[TABLE]

∎

Bibliography41

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Agarwal et al. [2018 a] Alekh Agarwal, Alina Beygelzimer, Miroslav Dudik, John Langford, and Hanna Wallach. A reductions approach to fair classification. In International Conference on Machine Learning , pages 60–69, 2018 a.
2Agarwal et al. [2018 b] Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, and Hanna M. Wallach. A reductions approach to fair classification . In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018 , pages 60–69, 2018 b.
3Agarwal et al. [2019] Alekh Agarwal, Miroslav Dudík, and Zhiwei Steven Wu. Fair regression: Quantitative definitions and reduction-based algorithms . In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA , pages 120–129, 2019.
4Blum [1994] Lawrence Blum. Moral Perception and Particularity . Cambridge University Press, 1994. ISBN 9780511624605.
5Conitzer et al. [2018] Vincent Conitzer, Walter Sinnott-Armstrong, Jana Schaich Borg, Yuan Deng, and Max Kramer. Moral decision making frameworks for artificial intelligence. In Proceedings of the International Symposium on Artificial Intelligence and Mathematics (ISAIM) , 2018.
6Corbett-Davies and Goel [2018] Sam Corbett-Davies and Sharad Goel. The measure and mismeasure of fairness: A critical review of fair machine learning . Co RR , abs/1808.00023, 2018.
7Dwork et al. [2012 a] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference , pages 214–226. ACM, 2012 a.
8Dwork et al. [2012 b] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard S. Zemel. Fairness through awareness . In Innovations in Theoretical Computer Science 2012, Cambridge, MA, USA, January 8-10, 2012 , pages 214–226, 2012 b. · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

An Algorithmic Framework for Fairness Elicitation

Abstract

1 Introduction

1.1 Results

Our Model

Our Theoretical Results

Our Experimental Results

1.2 Related Work

2 Problem Formulation

2.1 Fairness Loss

2.2 Cost-sensitive Classification

3 Empirical Risk Minimization

Theorem 3.1**.**

3.1 Outline of the Solution

3.2 The Primal Player’s Best Response

Lemma 3.2**.**

Lemma 3.3**.**

Proof.

Lemma 3.4**.**

Proof.

3.3 The Dual Player’s No-regret Updates

Lemma 3.5**.**

Lemma 3.6**.**

Proof.

Lemma 3.7**.**

Proof.

Remark 3.8**.**

Theorem 3.9**.**

Proof.

4 Generalization

Theorem 4.1**.**

5 A Behavioral Study

5.1 Data

5.2 Subjective Fairness Elicitation

5.3 Results

Acknowledgements

Appendix A Omitted details in Section 3

A.1 Primal player’s best response

Lemma A.1** (Restatement of Lemma 3.2).**

Proof.

A.2 Dual player’s best response

Lemma A.2** (Restatement of Lemma 3.5).**

Proof.

Lemma A.3**.**

Proof.

Lemma A.4**.**

Proof.

A.3 No-regret dynamics

Theorem A.5** ([12]).**

A.3.1 Omitted proof of theorem 3.1

proof of theorem 3.1.

Appendix B Generalization

B.0.1 Error

Theorem B.1** ([20]).**

B.0.2 Fairness Loss

Theorem B.2** (McDiarmid’s Inequality).**

Lemma B.3**.**

Proof.

Theorem B.4**.**

Proof.

Lemma B.5**.**

Proof.

Corollary B.6**.**

Proof.

Definition B.7** ([32]).**

Theorem B.8** ([32]).**

Lemma B.9** ([30]).**

Lemma B.10**.**

Proof.

Lemma B.11**.**

Proof.

B.1 Omitted proof of theorem 4.1

proof of theorem 4.1.

Theorem 3.1.

Lemma 3.2.

Lemma 3.3.

Lemma 3.4.

Lemma 3.5.

Lemma 3.6.

Lemma 3.7.

Remark 3.8.

Theorem 3.9.

Theorem 4.1.

Lemma A.1 (Restatement of Lemma 3.2).

Lemma A.2 (Restatement of Lemma 3.5).

Lemma A.3.

Lemma A.4.

Theorem A.5 ([12]).

Theorem B.1 ([20]).

Theorem B.2 (McDiarmid’s Inequality).

Lemma B.3.

Theorem B.4.

Lemma B.5.

Corollary B.6.

Definition B.7 ([32]).

Theorem B.8 ([32]).

Lemma B.9 ([30]).

Lemma B.10.

Lemma B.11.