Efficient candidate screening under multiple tests and implications for fairness
Lee Cohen, Zachary C. Lipton, Yishay Mansour

TL;DR
This paper extends screening models to multiple noisy tests, analyzing optimal employer policies and revealing fairness challenges when noise varies across groups in candidate evaluation.
Contribution
It introduces a multi-test screening framework with adaptive policies and highlights inherent fairness limitations across different candidate groups.
Findings
Optimal policies depend on test noise levels and adaptivity.
Impossibility results show fairness issues when noise differs across groups.
Multi-test models reveal complex trade-offs in candidate screening.
Abstract
When recruiting job candidates, employers rarely observe their underlying skill level directly. Instead, they must administer a series of interviews and/or collate other noisy signals in order to estimate the worker's skill. Traditional economics papers address screening models where employers access worker skill via a single noisy signal. In this paper, we extend this theoretical analysis to a multi-test setting, considering both Bernoulli and Gaussian models. We analyze the optimal employer policy both when the employer sets a fixed number of tests per candidate and when the employer can set a dynamic policy, assigning further tests adaptively based on results from the previous tests. To start, we characterize the optimal policy when employees constitute a single group, demonstrating some interesting trade-offs. Subsequently, we address the multi-group setting, demonstrating that when…
| General | When | |||
|---|---|---|---|---|
| Skilled () | Unskilled () | Skilled | Unskilled | |
| accept | ||||
| reject | ||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Efficient candidate screening
under multiple tests
and implications for fairness
Lee Cohen Zachary C. Lipton Yishay Mansour Tel Aviv University. This work was supported in part by The Yandex Initiative for Machine Learning.
Email: [email protected].Carnegie Mellon University and Amazon AI. This work was supported by the AI Ethics and Governance Fund. Email: [email protected].Tel Aviv University and Google Research. This work was supported in part by a grant from ISF.
Email: [email protected].
Abstract
When recruiting job candidates, employers rarely observe their underlying skill level directly. Instead, they must administer a series of interviews and/or collate other noisy signals in order to estimate the worker’s skill. Traditional economics papers address screening models where employers access worker skill via a single noisy signal. In this paper, we extend this theoretical analysis to a multi-test setting, considering both Bernoulli and Gaussian models. We analyze the optimal employer policy both when the employer sets a fixed number of tests per candidate and when the employer can set a dynamic policy, assigning further tests adaptively based on results from the previous tests. To start, we characterize the optimal policy when employees constitute a single group, demonstrating some interesting trade-offs. Subsequently, we address the multi-group setting, demonstrating that when the noise levels vary across groups, a fundamental impossibility emerges whereby we cannot administer the same number of tests, subject candidates to the same decision rule, and yet realize the same outcomes in both groups.
1 Introduction
Consider an employer seeking to hire new employees. Clearly, the employer would like to hire the best employees for the task, but how will she know which are best fit? Typically, the employee will gather information on each candidate, including their education, work history, reference letters, and for many jobs, they will actively conduct interviews. Altogether, this information can be viewed as the signal available to the employer.
Suppose that candidates can be either skilled or unskilled. If the firm hires an “unskilled” candidate, it will incur a significant cost on account of lost productivity. For this reason, the employer would like to minimize the number of False Positive mistakes, instances where unskilled candidates are hired. On the other hand, the employer desires not to overspend on the hiring process, limiting the number of interviews per hired candidate (either on average, or absolutely). However, fewer interviews weakens the signal, causing the employer to make more mistakes. At the heart of our model is this inherent trade-off between the quality of the signal and the cost of obtaining the signal. This marks a departure from the classical economics literature, in which the signal is commonly regarded as a given.
Complicating matters, hiring efficiency is not the only desiderata at play. In society, employees belong to various demographic groups, and we may strive to design policies that are in some sense fair vis-a-vis group membership. While fairness can be an elusive notion, regulators must translate it to concrete rules and laws. In the United States, a body of anti-discrimination law dating to the Civil Rights act of 1964, subjects decisions that result in disparate outcomes (as delineated by race, age, gender, religion, etc) to extra scrutiny: employers must not only show that preference for some groups over others did not drive the decision (disparate treatment doctrine) but also justify that any observed disparities arise from a business necessity (disparate impact doctrine), whether or not those disparities were intentional.
In this paper, we seek to understand how a complex hiring process would interact with the requirements of fairness. We extend the theory on candidate screening and statistical discrimination, addressing the setting in which employers can subject employees to multiple tests, which we assume to be conditionally independent given the worker’s skill level and group membership. To build intuition, most of our analysis focuses on a Bernoulli model of both worker skill and screening. Additionally, we begin to extend the traditionally-studied Gaussian skill and screening models to the multi-test setting (Section 5).
Unlike the classical papers, in which an employer’s hiring policy is given by a simple thresholding rule, our setting requires greater care to derive the optimal employer policy. In our setting, we imagine that the employer wishes to minimize the number of tests performed subject to a constraint upper-bounding the false positive rate. We characterize the optimal policy in this case as a randomized threshold policy.
We also consider the setting in which employers can allocate tests dynamically, deciding after each result whether to (i) hire the candidate; (ii) reject the candidate and move on to the next one; or (iii) administer a subsequent test. In the Bernoulli case, the optimal policy consists of administering tests until each candidate’s posterior likelihood of being a high-skilled worker either dips below the prior or rises above a threshold determined by the tolerable false positive rate. We demonstrate that the analysis of this process can be reduced to a random walk over the log posterior odds and derive the solution via the corresponding Gambler’s ruin problem.
Finally, we consider the ramifications for fairness within our model when employees, despite possessing similarly-distributed skills, are evaluated with differing noise levels.
1.1 Related work
The classical economics literature on discrimination in employment can broadly be divided into two focuses. The taste-based discrimination model due to [3] models the market outcomes in a setting where employers express an explicit preference for hiring members of one group, acting as if an employee’s demographic membership provides utility. This preference for certain groups induces a sorting of employees from the disadvantaged group towards those employers who discriminate the least with wages ultimately determined by the marginal discriminator. Subsequently, [16] suggested a statistical mechanism by which similarly-skilled employees from different groups might experience differential outcomes: the comparative difficulty of screening from one group vs. another. Many subsequent works extend this analysis, typically focusing on Gaussian models of worker quality and conditionally-Gaussian test scores [2, 1]. These papers consider the setting where workers are assessed via a single test characterized by a group-dependent noise level. Our work is differentiated from these by considering richer mechanisms for acquiring signal.
In the more recent literature on fairness in machine learning, researchers often focus on binary classification, with employees characterized by a protected characteristic (group membership), and other (non-protected) covariates [15, 11, 12]. There, the predictor is presumably used to guide a consequential decision, such as allocating some economic good (loans, jobs, etc.) [6] or assessing some penalty (e.g. risk scores to guide bail decisions) [5]. Papers then focus on various interventions for ensuring accurate prediction subject to various constraints such as demographic parity (outcomes independent of group membership), blindness (model cannot observe group membership), and equalized false negative and/or false positive rates [9]. Several simple impossibility results preclude simultaneously satisfying several combinations of these parities [4, 5, 13]. More recently, a number of papers have drawn inspiration from economic modeling, extending the literature on fairness in classification to consider longer-term dynamics, equilibria, and the emergence of feedback loops [10, 9, 7]. Finally, [17] provide a survey of definitions from the algorithmic fairness literature.
2 The Bernoulli Model
We formalize our problem as follows. An employer accesses an infinite pool of candidates (indexed by ), each of which has some (hidden) skill level , which denote unskilled and skilled, respectively. Underlying worker skill levels are sampled independently from a Bernoulli distribution with parameter . An employer can access information about the -th candidate through a sequence of tests, which are conditionally independent given . Each test result, disagrees with the ground truth skill with probability , where , i.e., 111 is the XOR operation between two binary random variables, and therefore is also a random variable.. For convenience, we denote the noise level as . We say that a test result is flipped if , and the number of flipped results for a given candidate is denoted by is , where is the indicator function.
A selection criterion is a mapping between test results to actions: , where [math] means reject and means accept (hire). A policy sets the selection criteria based on and other possible constraints such as probability to hire, error probability, etc. A randomized threshold policy is a policy with parameters such that for , for , and for the probability that is . We call a policy a threshold policy if . In a dynamic policy, rather than setting a fixed number of tests per candidate, the employer may decide after each test whether to accept, reject, or to perform an additional test, i.e., . Note that for a dynamic policy, the number of tests is a random variable determined based on the tests’ outcomes. When designing a policy, one must carefully consider the balance between the following desiderata:
Minimize False Discovery Rate (FDR)—the fraction of unskilled workers among the accepted candidates, i.e., . 2. 2.
Minimize False Omission Rate (FOR)—the fraction of skilled workers among the rejected candidates, i.e., . 3. 3.
Minimize False Negatives (FN)—the amount of skilled workers that are classified as unskilled. 4. 4.
Minimize False Positives (FP)—the amount of unskilled workers that are classified as skilled. 5. 5.
Ratio of accept probability and the number of tests—the number of tests performed per candidate hired, using a parameter , we have .
For any fixed number of tests , increasing the threshold of a threshold policy decreases FDR while increasing FOR.
Loss: To handle the trade-off between the false positives, (i.e., when an unskilled candidate is accepted) and false negatives (i.e., when a skilled candidate is rejected), we introduce an -loss, paramaterized by and defined as follows:
[TABLE]
where is the indicator function and . The expected loss of a policy is,
[TABLE]
where the expectation is over the type of the candidates , the test results , and the decisions of .
3 Analysis of the Bernoulli Model with One Group
To begin, we analyze this hiring model for a single group of candidates. The employer’s goal is to minimize the expected loss, , while maintaining a given acceptance probability. For brevity, we relegate all proofs to the appendix.
3.1 The Simple Threshold Policy (Equal Number of Tests)
Consider the setting where the employer must subject all candidates to an equal number of tests and threshold (these parameters are chosen by the employer but thereafter constant across candidates). For a given threshold, we can relate the flip probability (error rate) of the test to the probability that a candidate is accepted as follows:
Recall that , , that , and that and are the only parameters of the threshold policy, . Informally, is the number of passed tests and is the number of flips (tests in error). The probability of hiring an unskilled candidate is given by:
[TABLE]
Since is a binomial random variable with parameters and , we can calculate this probability precisely as: , and the probability of rejecting a skilled candidate is the probability that they encounter more than flips, thus:
Similarly, given a candidate’s skill level, we can calculate the probability that they obtain exactly positive tests out of , i.e,
[TABLE]
[TABLE]
Given these observations, we can now analyze the employer’s choices.
Optimal solution for any ratio
The next theorem shows that for threshold policies, the expected loss is minimized at such that .
Theorem 1**.**
The loss function is quasi-convex and a threshold of minimizes loss for any values of . Namely,
[TABLE]
Next, we bound the number of tests required to guarantee that the probability of classification error by the majority decision rule (i.e., ) does not exceed a specified quantity .
Theorem 2**.**
For every , performing tests per candidate and using majority as a decision rule (i.e., ) guarantees .
Equal cost for false positives and false negatives
Consider the simple loss consisting of the classification error rate (false positives and false negatives count equally), expressed via our loss function by setting . When skilled and unskilled candidates occur with equal frequency, i.e., , we can derive that the majority decision rule minimizes the classification error for any number of tests.
Corollary 3**.**
Assume and . For any number of tests , the majority decision rule minimizes loss . Namely, In addition, for every , performing tests per candidate and using majority as a decision rule guarantees classification error with probability of at most .
FDR minimization with limited number of tests per hire for balanced groups
Again, assuming balanced groups (i.e., ), suppose that an employer would like to minimize the false discovery rate, subject to the constraint of lower bounding the hiring probability. We can model this optimization problem by introducing a budget parameter to bound any predetermined (fixed) number of tests per hired candidate as follows:
[TABLE]
where is the number of tests performs. The following theorem shows that the optimal policy is a randomized threshold policy.
Theorem 4**.**
There exists a randomized threshold policy which is an optimal solution for (2).
3.2 The Dynamic Policy (Adaptively-Allocated Tests)
Recall that under a dynamic policy, the employer can decide after each test whether to accept, reject, or perform another test. In general, dynamic policies are more efficient than those that must set a fixed number of tests. To build intuition, consider a candidate that has passed out of tests. As seen above, under an optimally-constructed fixed-test policy, any candidate that fails a single test might be rejected. 222For example, if and , the lowest false discovery rate is achieved by . However, the posterior probability that this candidate is in fact skilled may still be greater than that of a fresh candidate sampled from the pool. Thus we can improve on the fixed-test policy by dynamically allocating more tests to candidates until their posterior odds either dip below the prior odds or rise above the threshold for hiring. The following theorem formalizes this notion that it is better to administer more tests to a candidate that passed the majority of previous tests than to start afresh with a new candidate:
Theorem 5**.**
For any , a candidate that passed out of tests is more likely to be a skilled than a freshly-sampled candidate for whom no test results are yet available, i.e., .
Remark 6**.**
If , the inequality would have been reversed.
The Greedy Policy
We now present a greedy algorithm that continues to test a candidate so long as the posterior probability that is greater than and smaller than , rejects a candidate whenever the posterior falls below (absent fairness concerns, employers will set for all groups), and accepts whenever the posterior rises above . Given parameters , we show that the greedy policy solves the optimization problem of minimizing the mean number of tests under these constraints, i.e.,
[TABLE]
Our analysis of this policy builds upon the observation that conditioned on a worker’s skill, the posterior log-odds after each test perform a one-dimensional random walk, starting with the prior log-odds and moving, after each test result, either left (upon a failed test) or right (upon a passed test). When (as in our model) the probability of a flip are equal for skilled and unskilled candidates, our random walk has a fixed step size. Moreover, our random walk has absorbing barriers corresponding to (when ) falling below the prior log odds (on the left) and exceeding the hiring threshold (on the right). Owing to the fixed step size and absorbing barriers, our policy resembles the classic problem of Gambler’s ruin, in which a gambler wins or loses a unit of currency at each step, and loses when crossing a threshold on the left (going bankrupt) or on the right (bankrupting the opponent). We formalize the random walk as follows where is the position on the walk at time :
is the prior log-odds of the candidate, i.e., . 2. 2.
After each test result, is observed, .
Let be the policy that accepts a candidate if , rejects if , and otherwise conducts an additional test, i.e.,
[TABLE]
An employer will generally set the lower absorbing barrier to reject all candidates with posterior log odds less than since a fresh candidate from the pool is expected to be better. However, when noise levels differ across groups, we may prefer in the interest of fairness to set lower than for members of the noisier group, allowing us to equalize the frequency of false negatives across groups (see Section 4).
Lemma 7**.**
Let be the parameters that satisfy and (i.e., and ). Then iff (iff the candidate is accepted) and iff (iff the candidate is rejected).
Corollary 8**.**
The policy can be described as follows.
[TABLE]
We use the following parameters in the next theorems:
[TABLE]
Theorem 9** (Expected number of tests per type).**
The expected number of tests until a decision (namely accept or reject) for skilled candidates is and for unskilled candidates.
For the probabilities of the candidates to be accepted or rejected, conditioned on their true skill level, we present the results in a form of confusion matrix in Table 1.
Theorem 10**.**
The expected number of tests until deciding whether to accept or reject a candidate is , where .
4 Fairness Considerations in the Two-Group Setting
Two Groups—Threshold Policies
We now discuss the effects of a threshold policy when candidates belong to two groups, and whose skill level is distributed identically, but whose tests are characterized by different noise levels. Without loss of generality, we assume that , where is the probability that a test result of a candidate from is different from his skill level. To begin, we note the fundamental irreconcilability of equalizing either the false positive or the false negative rates across groups with subjecting candidates to the same policy.
Theorem 11** (Impossibility result).**
When noise levels differ between two groups with identical skill level distribution, a single Threshold Policy (with the same number of tests and the same threshold for both groups) cannot have equality in either the false negative rates or in the false positive rates across the groups. Particularly, there is a higher false positive rate in the noisier group, as an unskilled candidate from is more likely to be accepted by the threshold policy than an unskilled candidate from :
[TABLE]
and also a higher false negative rate, as a skilled candidate from is more likely to be rejected than a skilled candidate from :
[TABLE]
**Connection to Economics Literature **Aigner and Cain [1] discuss a similar case under a Gaussian screening model where the variance (noise level) of the single test differs across the two groups. Similarly, they note that qualified candidates fare worse in the noisy group but that unqualified candidates fare better in the noisier group. Our work differs from theirs in that we consider the effect of multiple tests and the ability to optimize over the number of tests.
Two Groups–Dynamic policy
We now consider the (dynamic) hiring policy in the setting when employees belong to two groups, and with identically-distributed skills but different noise levels . We note that there are two ingredients that explain the differences among the groups: (i) The step size, of (the noisier group) is smaller than the step size of . Thus these candidates must typically pass more tests before they are accepted; and (ii) Skilled candidates in group exhibit less drift to the right (they have a higher probability of failing a test). Consequently, when an employer (rationally) sets for all groups, a skilled candidate from is more likely to be fail a test in step , at which point the dynamic policy summarily rejects them. These two facts explain both the higher false negative rates for and the longer expected duration until acceptance. By setting for members of the noisier group, we can equalize false negative rates. Precisely, setting achieves the desired parity. The cost of this intervention is that it requires more tests for candidates from the noisier group. Here, our random walk analysis can be leveraged to determine exactly how many more. Once again, we cannot provide equality across the groups in all desired ways—the same acceptance criterion, the same expected number of tests, and the same false negative rates between groups—with the noise differs across groups.
5 Gaussian Worker Screening Model
In this section, we work out the analytic solutions for the conditional expectation of worker qualities given a series of conditionally independent tests s.t. , . We assume that the worker quality normally distributed with mean and variance , so instead of binary skill level we have continuous quality of candidates. Conditioned on , each test is generated according to the structural equation , where is a normally distributed noise term with mean [math] and variance . Equivalently, we can say that the conditional distribution for each test is Gaussian with mean and variance . We refer the reader to Appendix B for further details.
We show that we can equalize conditional variance between the two groups by giving more interviews to to noisier group, and that it yields the same conditional expectations.
Theorem 12**.**
For two groups, with the same worker quality , that differ only in the variance of their noise , the variance can be equalized by using interviews (or tests) for , where is the number of interviews for each candidate from .
Theorem 13**.**
When equalizing conditional variances between by using , we get the same conditional expectations, .
6 Unsupervised Parameter Estimation
Now, under the assumption of realizable case, we explain how one can estimate the parameters and given tests results from a homogeneous population. Surprisingly, we discover that parameter recovery in this model does not require any ground truth labels indicating whether an employee is skilled or unskilled. We use Hoeffding’s inequality to bound the absolute difference between the estimated parameters and the true parameters by choosing as the wanted upper bound and solving for the number of samples or .
Lemma 14** (Hoeffding’s inequality).**
Let be sub–gaussian random variables. Then, for any ,
[TABLE]
If are Bernoulli random variables with parameter ,
[TABLE]
We start by estimating and then use it to derive an estimate for . The estimated parameters are denoted by and . Notice that in order to have any information regarding the true value of , we need to have candidates with at least two tests. Hence, from now on we assume exactly that, i.e., for dynamic policies and for fixed number of tests policies.
Now, in both policies we have showed that the optimal rule is to reject candidates that fail their first test. Therefore inconsistencies between the first two tests are seen only in cases where .
Let be the number of inconsistencies in the first two tests, i.e., , and let be the number of candidates with at least two tests. Since is generated by sampling times, the distribution and we can estimate as stated in the next theorem:
Theorem 15**.**
If we have results from candidates, by using , then with probability we have that .
Having an estimation of the parameter , we can calculate the estimated as follows: Let be the percentage of positive first tests. Since this number is generated by the distribution , we can estimate using the estimated value of .
Theorem 16**.**
If we have results from candidates, by using , we get that with probability we have that .
Under the Gaussian screening model, the parameter estimation is also straightforward (assuming realizability) without access to the true skill level of the employees. We start by looking at a single candidate, . Each of his test results, is generated from a conditional distribution which is a Gaussian with mean and variance . Since this variance is common among all the candidates, we can simply average the estimated variance of every candidate to get an approximation for . Suppose is a sequence of i.i.d tests of candidate , and let be the empirical mean of candidate ’s tests.
The following theorem is a result from Hoeffding’s Inequality, in which we use to bound the error of our estimated parameters.
Theorem 17**.**
By using the following as estimators for Gaussian parameters , and (notice that and ), the difference between each parameter and it’s estimator is bounded by .
7 Discussion and Future Work
Consider two groups with identically-distributed skills and characterized by different noise levels in screening. Our results demonstrate that if a regulatory body (e.g., policymakers or a regulator) insists on the same number of tests and the same decision rule for both groups, this would yield higher false positive rates in any threshold policy. As a result, hired candidates from the noisier group would suffer higher rates of firing. In turn, this might lead employers to erroneously conclude that this group’s skill level is lower than it actually is. This paper presents a policy that handles this problem by minimizing the false positive rates of both groups, in the form of a greedy policy. Moreover the greedy policy is efficient, minimizing the expected number of tests per hire among all policies that achieve a specified false positive rate and continue testing every candidates that appear better than the a new one. However, the dynamic policy will still suffer (as does the simple threshold policy) from higher false negative rates for the noisier group, violating a notion of fairness dubbed equality of opportunity in the recent literature on fairness in machine learning [9]. We addressed this problem by modifying the greedy policy to reject candidate iff by setting . Our greedy policy can be made forgiving and equalize false negative rates across groups. In future work, we plan to explore extensions to the Gaussian model.
Appendix A Proofs
A.1 Proofs from Section 3
Proof of Theorem 1.
To prove the theorem, we show that the loss function , as a function of is quasi-convex and achieves its minimum value at .
Namely, we show that the loss is monotone increasing for , i.e., increasing increases the loss: .
Similarly, we show that for , we have .
Indeed,
[TABLE]
[TABLE]
Since and , we have
[TABLE]
The above expression is positive iff
[TABLE]
Since is the probability of exactly flips, and is the probability of exactly flips, we can calculate those probabilities as follows:
[TABLE]
[TABLE]
Substituting expression in (3), we get
[TABLE]
Rearranging, we get
[TABLE]
Applying on both sides gets us
[TABLE]
Solving for , we find that the inequality holds if
[TABLE]
For , we have
[TABLE]
and for , we have
[TABLE]
This implies that the maximum is .
∎
Proof of Theorem 2.
We start with a skilled candidate. The expected number of tests that a skilled candidate passes is .
By using Hoeffding’s inequality for Bernoulli distributions, for every ,
[TABLE]
Choosing yields (as is odd), which holds iff a majority threshold policy would predict that this is an unskilled candidate (false negative). Solving for , we get .
We now repeat the process for an unskilled candidate. The expected number of tests that an unskilled candidate passes is .
By using Hoeffding’s inequality again, we have
[TABLE]
Choosing yields , which holds iff a majority threshold falsely predicts that this is a skilled candidate (false positive). Solving for again, we get .
Overall, ∎
Proof of Theorem 4.
Let be any optimal policy for (2) (not necessarily threshold) with a fixed number of tests, . We will show, in two steps, how to transform it into an optimal randomized threshold policy. The first step is to symmetrize . Let . Define a policy , which performs tests, and accepts with probability where . Clearly, both and have the same accept probability. In addition, since condition on , any sequence of outcomes is equally likely. Furthermore, and the probability that given any sequence of outcomes with , is identical. (Technically, is a sufficient statistics.) This implies that the false discovery rate is also unchanged.
This yields that with the randomization vector is also optimal.
The second step is to suppose—for sake of contradiction—that is not a randomized threshold policy. We will show that we can improve the FDR of while keeping the probability of acceptance unchanged. This will contradict the hypothesis that is optimal.
If is not a randomized threshold policy, then there is no and , such that
[TABLE]
Now, let be the minimal value such that and let be the minimal value for which . Clearly, the FDR is lower at than at . Intuitively, we can shift some probability mass, from to in a way that maintains the acceptance probability of and decreases the false positive rates.
Let be such that . Let be a modified randomization vector for such that and for every . Since , the acceptance probability remains the same. As for the false discovery rate, since , is higher with than with , is lower with than with and for any , with is the same as with , the false discovery rate with is lower, which contradicts the optimality of with as the randomization vector. ∎
Proof of Theorem 5.
Using Bayes’ theorem, the conditional probability can be decomposed as
[TABLE]
[TABLE]
Since and , we get
[TABLE]
Since it holds that ,
[TABLE]
So,
[TABLE]
And finally,
[TABLE]
∎
Proof of Lemma 7.
Let , and let be any of the possible values of . Note that
[TABLE]
Since the are i.i.d., we have
[TABLE]
Since
[TABLE]
we have
[TABLE]
Since
[TABLE]
and
[TABLE]
assigning and , we get
[TABLE]
Applying (5) in (4) and adding gives us
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Applying (5) in (4) and adding gives us
[TABLE]
Hence
[TABLE]
∎
Proof of Theorem 9.
First recall that given a skilled candidate, for every test ,
[TABLE]
[TABLE]
Hence
[TABLE]
The lower absorbing barrier is reached when a candidate’s posterior skill level is lower than the prior of the skill level, i.e.,
[TABLE]
and the starting point is just one step away from the lower absorbing barrier:
[TABLE]
According to Corollary 8, the upper absorbing barrier is in
[TABLE]
To derive the results for the expected duration of the random walk for skilled and unskilled candidates, we shift the locations of the absorbing points so that the lower barrier would be in 0 and also divide them by a step size (so now we have that every step is of size ). The new upper absorbing barrier is at
[TABLE]
And we also shift the starting point:
[TABLE]
As stated in [8], the expected duration of a random walk with absorbing barriers of [math] and from is (equation 3.4, chapter XIV [page 348]):
[TABLE]
Hence,
[TABLE]
As for unskilled candidates, the absorbing points and the starting point are the same, the only difference is that
[TABLE]
and
[TABLE]
Therefore,
[TABLE]
and we deduce
[TABLE]
∎
Deviations for the confusion matrix (Table 1).
We split the claim in the confusion matrix (Table 1) into two parts. First, using equation (2.4) from chapter XIV [page 345] in [8], we get
[TABLE]
and
[TABLE]
The second part follows from the fact the gambler’s ruin must end in case of absorbing barriers.
[TABLE]
[TABLE]
Where . For and we get and , therefore
[TABLE]
Hence .
[TABLE]
[TABLE]
Hence . ∎
Proof of Theorem 10.
[TABLE]
[TABLE]
[TABLE]
∎
A.2 Proofs from Section 4
The next lemma aids in the proof of Theorem (11).
Lemma 18**.**
Let be a Binomial random variable with parameters and . Given a number of successes, , we know that the probability mass function of is . Let be the likelihood function of the event . Then the maximum likelihood of is . I.e.,
[TABLE]
Proof of Lemma 18.
We notice that does not depend on , thus
[TABLE]
The log-likelihood is particularly convenient for maximum likelihood estimation. Logarithms are strictly increasing functions, as a result, maximizing the likelihood is equivalent to maximizing the log-likelihood, i.e.,
[TABLE]
Differentiating (with respect to ) and comparing to zero we get
[TABLE]
And after refactoring,
[TABLE]
The function is a strictly concave as its second derivative is negative,
[TABLE]
And since the derivative of a strictly concave function is zero at , then is a global maximum. Therefore, obtains absolute maximum in . ∎
Proof of Theorem 11.
Let be a random variable that represents the number of flips out of a -tests sequence with a noise level of , i.e., is the number of times when for . We use to express as the probability that at least flips,
[TABLE]
and the probability of as at most flips, thus
[TABLE]
From Lemma (18) and since probability density function (pdf) are is monotone increasing, we derive that the pdf of satisfies monotone likelihood ratio property over the pdf of . This implies that the pdf of also has first-order stochastic dominance over by Theorem 1.1 in [18]. From stochastic dominance, we can derive the desired inequalities
[TABLE]
and
[TABLE]
∎
Appendix B Gaussian Worker Screening Model Extension
In this extension, we characterize the conditional expectation, and the conditional variance of given the tests, i.e., .
First, note that because is Gaussian, and the conditionals are all Gaussian, the joint probability is a multivariate Gaussian. We work out the precise analytic forms for the mean and variance of the conditional in terms of the quality and noise variances ( and ) and the number of tests .
To begin, we note the properties of the joint distribution over . Owing to the generative process for our , all have mean , and thus the joint over the means is an -dimensional vector . The full covariance matrix has the form
[TABLE]
where all off-diagonal entries for have value and all diagonal entries corresponding to the variance of tests take value . The top-left entry corresponds to the variance of the test Q and thus has variance .
We can now derive the equations for the conditional mean and conditional variance of . To begin, note the following basic facts about deriving conditionals from multivariate Gaussians: to estimate the conditional of one set of variables, given another set we can segment our data matrix into those rows corresponding to the variables we don’t condition upon (here, just ) and those we do (here, ), expressing our covariance matrix in terms of the following submatrices:
[TABLE]
Here, , , , and .
The conditional expectation is then expressed as
[TABLE]
and the conditional variance can be expressed as
[TABLE]
which should be familiar as the Schur complement of the matrix . Intuitively, this corresponds to inverting the full matrix , deleting those rows and columns corresponding to observed variables, and then inverting the resulting () matrix back.
What remains is to show that for the particular covariance matrix that interests us (Equation 6), these expressions have simple analytically computable forms. Specifically we state the following simple theorems.
Theorem 19**.**
For jointly Gaussian variables characterized by the covariance matrix given in (Equation 6), the conditional expectation takes form
[TABLE]
Theorem 20**.**
For jointly Gaussian variables characterized by the covariance matrix given in (Equation 6), the conditional variance of given the tests takes form
[TABLE]
B.1 Proofs from Appendix B
Proof of Theorem 19.
The crucial step to apply Equation 7 to this data is to work out a simple expression for the inverse of the submatrix . We recall that this matrix is symmetrical with all diagonal entries equal to and all off diagonals equal to .
We call upon a lemma due to [14]. Which states that when is invertible and is a rank-1 matrix, the inverse of their sum takes the following form:
[TABLE]
We can decompose into such an and by defining to be the matrix that takes value everywhere and to be a diagonal matrix that takes values (along the main diagonal). Thus and we can proceed by applying the lemma.
First we note that is a diagonal matrix with all entries on the main diagonal set to . Then we note that is an matrix with all entries set to . Thus, .
The matrix has all entries equal to , and thus our desired inverse can be expressed as follows:
[TABLE]
Thus each off-diagonal entry takes values
[TABLE]
and each on-diagonal entry has an additional term that comes from .
[TABLE]
Now that we know the precise expression for all entries of , we can calculate the vector-matrix product . Because every entry of takes value , and because every column of has the same values (just in different order), the product is an dimensional vector, where all values are equal:
[TABLE]
This expression for , together with the definition of the conditional expectation of a multivariate Gaussian (Equation 7) concludes the proof.
∎
Proof of Theorem 20.
We can now produce the expression for . Because every entry of the matrix takes value
[TABLE]
and because every entry in the matrix takes value ,
[TABLE]
The expression for the conditional variance follows:
[TABLE]
∎
B.2 Proofs
Proof of Theorem 12.
First, recall that
[TABLE]
Solving for in the equation ,
[TABLE]
we get
[TABLE]
and hence
[TABLE]
Extracting , we find that . ∎
Proof of Theorem 13.
First, recall that
[TABLE]
Now,
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aigner & Cain [1977] Aigner, D. J., & Cain, G. G. (1977). Statistical theories of discrimination in labor markets. ILR Review , 30 (2), 175–187.
- 2Arrow et al. [1973] Arrow, K., et al. (1973). The theory of discrimination. Discrimination in labor markets , 3 (10), 3–33.
- 3Becker [1957] Becker, G. S. (1957). The economics of discrimination chicago. University of Chicago .
- 4Berk et al. [2018] Berk, R., Heidari, H., Jabbari, S., Kearns, M., & Roth, A. (2018). Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research .
- 5Chouldechova [2017] Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data , 5 (2), 153–163.
- 6Corbett-Davies & Goel [2018] Corbett-Davies, S., & Goel, S. (2018). The measure and mismeasure of fairness: A critical review of fair machine learning. ar Xiv preprint ar Xiv:1808.00023 .
- 7Ensign et al. [2017] Ensign, D., Friedler, S. A., Neville, S., Scheidegger, C., & Venkatasubramanian, S. (2017). Runaway feedback loops in predictive policing. ar Xiv preprint ar Xiv:1706.09847 .
- 8Feller [1968] Feller, W. (1968). An Introduction to Probability Theory and Its Applications , vol. 1. Wiley.
