TL;DR
This paper introduces a novel local fairness formulation for paper-reviewer matching, along with two algorithms, FairIR and FairFlow, that improve fairness and efficiency in assigning reviewers to papers.
Contribution
The paper proposes a new local fairness formulation for paper matching and introduces two algorithms, FairIR and FairFlow, to optimize this formulation efficiently.
Findings
Both algorithms improve fairness over standard methods.
FairIR maximizes the fairness objective with provable guarantees.
FairFlow is faster and achieves competitive fairness.
Abstract
Automatically matching reviewers to papers is a crucial step of the peer review process for venues receiving thousands of submissions. Unfortunately, common paper matching algorithms often construct matchings suffering from two critical problems: (1) the group of reviewers assigned to a paper do not collectively possess sufficient expertise, and (2) reviewer workloads are highly skewed. In this paper, we propose a novel local fairness formulation of paper matching that directly addresses both of these issues. Since optimizing our formulation is not always tractable, we introduce two new algorithms, FairIR and FairFlow, for computing fair matchings that approximately optimize the new formulation. FairIR solves a relaxation of the local fairness formulation and then employs a rounding technique to construct a valid matching that provably maximizes the objective and only compromises on…
| Data | Bounds | Alg | Time (s) | Obj | Min PS | Max PS | Mean PS | Std PS | Min RA | Max RA | Std RA |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Up | TPMS | 0.10 | 201.88 | 0.90 | 3.00 | 1.71 | 0.45 | 0 | 4 | 1.80 | |
| Up | PR4A | 293.83 | 197.32 | 0.92 | 2.57 | 1.67 | 0.38 | 0 | 4 | 1.79 | |
| Up | FairIr | 1.60 | 201.83 | 0.93 | 3.00 | 1.71 | 0.45 | 0 | 4 | 1.80 | |
| MIDL | Up | FairFlow | 1.15 | 197.67 | 0.94 | 2.75 | 1.68 | 0.41 | 0 | 4 | 1.80 |
| Lo + Up | TPMS | 0.17 | 150.04 | 0.00 | 3.00 | 1.27 | 0.69 | 2 | 2 | 0.00 | |
| Lo + Up | FairIr | 3.01 | 145.56 | 0.35 | 3.00 | 1.23 | 0.50 | 2 | 2 | 0.00 | |
| Lo + Up | FairFlow | 2.17 | 143.12 | 0.19 | 2.07 | 1.21 | 0.50 | 2 | 2 | 0.00 | |
| Up | TPMS | 47.24 | 5443.64 | 0.00 | 3.00 | 2.08 | 1.07 | 0 | 6 | 0.82 | |
| Up | PR4A (i1) | 3251.37 | 5134.08 | 0.77 | 3.00 | 1.96 | 0.52 | 0 | 6 | 1.24 | |
| Up | FairIr | 594.51 | 5373.39 | 0.27 | 3.00 | 2.05 | 0.84 | 0 | 6 | 0.83 | |
| CVPR | Up | FairFlow | 225.29 | 4444.95 | 0.77 | 3.00 | 1.69 | 0.64 | 2 | 6 | 0.61 |
| Lo + Up | TPMS | 49.62 | 5443.64 | 0.00 | 3.00 | 2.08 | 1.07 | 2 | 6 | 0.78 | |
| Lo + Up | FairIr | 694.03 | 5373.23 | 0.29 | 3.00 | 2.05 | 0.84 | 2 | 6 | 0.87 | |
| Lo + Up | FairFlow | 587.69 | 4339.60 | 0.94 | 3.00 | 1.65 | 0.48 | 3 | 6 | 0.63 | |
| Up | TPMS | 256.73 | 112552.11 | 1.37 | 29.24 | 22.23 | 5.52 | 0 | 9 | 2.97 | |
| Up | PR4A (i1) | 8683.79 | 108714.98 | 12.68 | 29.13 | 21.48 | 3.86 | 0 | 9 | 2.97 | |
| Up | FairIr | 3785.64 | 112263.94 | 7.19 | 29.24 | 22.18 | 4.75 | 0 | 9 | 2.96 | |
| 2018 | Up | FairFlow | 910.08 | 91029.66 | 11.12 | 29.19 | 17.98 | 4.49 | 0 | 9 | 2.91 |
| Lo + Up | TPMS | 636.01 | 108634.18 | 0.00 | 29.24 | 21.46 | 6.28 | 2 | 9 | 1.66 | |
| Lo + Up | FairIr | 4666.27 | 108083.00 | 7.17 | 29.24 | 21.35 | 5.06 | 2 | 9 | 1.67 | |
| Lo + Up | FairFlow | 1790.71 | 86166.07 | 10.52 | 22.79 | 17.02 | 2.77 | 2 | 9 | 1.61 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\newfloatcommand
capbtabboxtable[][\FBwidth]
Paper Matching with Local Fairness Constraints
Ari Kobren
College of Information and Computer Sciences
University of Massachusetts Amherst
Barna Saha
College of Information and Computer Sciences
University of Massachusetts Amherst
Andrew McCallum
College of Information and Computer Sciences
University of Massachusetts Amherst
Abstract
Automatically matching reviewers to papers is a crucial step of the peer review process for venues receiving thousands of submissions. Unfortunately, common paper matching algorithms often construct matchings suffering from two critical problems: (1) the group of reviewers assigned to a paper do not collectively possess sufficient expertise, and (2) reviewer workloads are highly skewed. In this paper, we propose a novel local fairness formulation of paper matching that directly addresses both of these issues. Since optimizing our formulation is not always tractable, we introduce two new algorithms, FairIr and FairFlow, for computing fair matchings that approximately optimize the new formulation. FairIr solves a relaxation of the local fairness formulation and then employs a rounding technique to construct a valid matching that provably maximizes the objective and only compromises on fairness with respect to reviewer loads and papers by a small constant. In contrast, FairFlow is not provably guaranteed to produce fair matchings, however it can be 2x as efficient as FairIr and an order of magnitude faster than matching algorithms that directly optimize for fairness. Empirically, we demonstrate that both FairIr and FairFlow improve fairness over standard matching algorithms on real conference data. Moreover, in comparison to state-of-the-art matching algorithms that optimize for fairness only, FairIr achieves higher objective scores, FairFlow achieves competitive fairness, and both are capable of more evenly allocating reviewers.
1 Introduction
In 2014, the program chairs (PCs) of the Neural Information Processing Systems (NeurIPS) conference conducted an experiment that allowed them to measure the inherent randomness in the conference’s peer review procedure. In their experiment, 10% of the submitted papers were assigned to two disjoint sets of reviewers instead of one. For the papers in this experimental set, the PCs found that the two groups assigned to review the same paper disagreed about whether to accept or reject the paper 25.9% of the time. Accordingly, if all 2014 NeurIPS submissions were reviewed again by a new set of reviewers, about 57% of the originally accepted papers would be rejected [24].
The NIPS experiment is only one of many studies highlighting the poor reliability of the peer reviewing process. For example, another study finds that the rate of agreement between reviewers for a clinical neuroscience journal is not significantly different from chance [26]. This is particularly troublesome given that decisions regarding patient care, expensive scientific exploration, researcher hiring, funding, tenure, etc. are all based, in part, on published scientific work and thus on the peer reviewing process.
Unsurprisingly, previous work shows that experts are able to produce higher quality reviews of submitted publications than non-experts. Experts are often able to develop more “discerning” opinions about the proposals under review [13, 2] and some researchers in cognitive science and artificial intelligence claim that experts can make more accurate decisions than non-experts about uncertain information [12]. Clearly peer review outcomes are likely to be of higher quality if each paper were reviewed exclusively by experts in the paper’s topical areas. Unfortunately, since experts are relatively scarce, this is often impossible. Especially for many computer science venues, which are faced with increasingly large volumes of submissions, assigning only experts to each submission is impossible given typical reviewer load restrictions. Further exacerbating the problem, conference decision processes are dictated by a strict timeline. This necessitates significant automation in matching reviewers to submitted papers, highly limiting the extent to which humans can significantly intervene.
Automated systems often cast the paper matching problem as a global maximization of reviewer-paper affinity. In particular, each reviewer-paper pair has an associated affinity score, which is typically computed from a variety of factors, such as: expertise, area chair recommendations, reviewer bids, subject area matches, etc. The optimal matching is one that maximizes the sum of affinities of assigned reviewer-paper pairs, subject to load and coverage constraints, which bound the number of papers to which a reviewer can be assigned and dictate the number of reviews each paper must receive, respectively [4, 29]. While optimizing the global objective has merit, a major disadvantage of the approach is that it can lead to matchings that contain papers assigned to a set reviewers who lack expertise in that paper’s topical areas [8, 27]. This is because in constructing a matching that maximizes the global objective, allocating more experts to one paper at the expense of another may improve the objective score. In order to be fair, it is important to ensure that each paper is assigned to a group of reviewers who instead possess a minimum acceptable level of expertise.
Recent work has attempted to overcome these problems by either (a) introducing strict requirements on the minimum affinity of valid paper-reviewer matches, or (b) optimizing the sum of affinities of the one paper that is worst-off [8, 27]. However, restricting the minimum allowable affinity often renders the problem infeasible as there may not exist any matching that provides sufficient coverage to all papers subject to the threshold. Previously proposed algorithms that maximize the sum affinities for the worst-off paper do result in matchings that are more fair, but they also suffer from two disadvantages: (1) they do not simultaneously optimize for the overall best assignment (measured by sum total affinity), and (2) they are agnostic to lower limits on reviewer loads (which are common in practice) and thus may produce matchings in which reviewers are assigned to dramatically different numbers of papers.
To address these issues, we introduce the local fairness formulation of the paper matching problem. Our novel formulation is cast as an integer linear program that (1) optimizes the global objective, (2) includes both upper and lower bound constraints that serve to balance the reviewing load among reviewers, and (3) includes local fairness constraints, which ensure that each paper is assigned to a set of reviewers that collectively possess sufficient expertise.
The local fairness formulation is NP-Hard. To address this hardness, we present FairIr, the FAIR matching via Iterative Relaxtion algorithm that jointly optimizes the global objective, obeys local fairness constraints, and satisfies lower (and upper) bounds on reviewer loads to ensure more balanced allocation. FairIr works by solving a relaxation of the local fairness formulation and rounding the corresponding fractional solution using a specially designed procedure. Theoretically, we prove that matchings constructed by FairIr may only violate the local fairness and load constraints by a small margin while maximizing the global objective. In experiments with data from real conferences, we show that, despite theoretical possibility of constraint violations, FairIr never violates reviewer load constraints. The experiments also reveal that matchings computed by FairIr exhibit higher objective scores, more balanced allocations of reviewers and competitive treatment of the most disadvantaged paper when compared to state-of-the-art approaches that optimize for fairness.
In real-conference settings, a program chair may desire to construct and explore many alternative matchings with various inputs, which demands an efficient fair matching algorithm. Toward this end, we present FairFlow, a min-cost-flow-based heuristic for constructing fair matchings that is faster than FairIr by more than 2x. While matchings constructed by FairFlow are not guaranteed to adhere to a specific degree of fairness (like FairIr or previous work), in experiments, FairFlow often constructs matchings exhibiting fairness and objective scores close to that of FairIr in a fraction of the time. Unlike FairIr and matching algorithms that rely on linear programming, FairFlow operates by first maximizing the global objective and then refining the corresponding solution through a series of min-cost-flow problems in which reviewers are reassigned from the most advantaged papers to the most disadvantaged papers.
This paper is organized as follows. Section 2 presents the standard paper matching formulation that optimizes the global objective. Section 3 covers our main contribution by providing the local fairness formulation of paper matching and describes FairIr and its formal guarantees. Section 4 presents the more efficient FairFlow heuristic. In Section 5, we experimentally show the effectiveness of our approach over other approaches on several datasets coming from real conferences.
2 Reviewer Assignment Problem
Popular academic conferences typically receive thousands of paper submissions. Immediately after the submission period closes, papers are automatically matched to a similarly sized pool of reviewers. A matching of reviewers to papers is constructed using real-valued reviewer-paper affinities. The affinity between a reviewer and a paper may be computed from a variety of factors, such as: expertise, bids, area chair recommendations, subject area matches, etc. Previous work has explored approaches for modeling reviewer-paper affinity via latent semantic indexing, collaborative filtering or information retrieval techniques [6, 4, 5]. We do not develop affinity models in this work. Instead, we focus on algorithms for matching papers to reviewers given the affinity scores. In the literature, this matching problem is known by many names; we choose the reviewer assignment problem (RAP) [15, 27].
The RAP is often accompanied by a two types of constraints: load constraints and coverage constraints [8]. A load constraint bounds the number of papers assigned to a reviewer; a coverage constraint defines the number of reviews a paper must receive. Typically, all papers must be reviewed the same number of times. Reviewers do not always have equal loads, although a highly uneven load is inherently unfair and may lead to reviewers declining to review or not submitting reviews on time.
Formally, let be the set of reviewers, be the set of papers and be a matrix of reviewer-paper affinities. The RAP can be written as the following integer program:
[TABLE]
Here, is the set of upper bounds on reviewer loads, and represents the coverage constraints. The matching of reviewers to papers is encoded in the variables , where indicates that reviewer has been assigned to paper . In this formulation, the objective is to maximize the sum of affinities of reviewer-paper assignments (subject to the constraints); it can be solved optimally in polynomial time with standard tools [29].
In practice, lower bounds on reviewer loads are often invoked in order to spread the reviewing load more equally across reviewers. The formulation above can be augmented to include the lower bounds by adding the following constraints:
[TABLE]
where is the set of lower bounds on reviewer loads. The resulting problem is still efficiently solvable. Note that the formulation above, with and without lower bounds, is currently employed by various conferences and conference management software, for example: TPMS, OpenReview, CMT and HotCRP [3, 27]. We will henceforth refer to the above two formulations as the TPMS RAP, where the inclusion of lower bounds will be clear from context.
3 Fair Paper Matching
It is well-known that optimizing the TPMS RAP can result in unfair matchings [8, 27]. To see why, consider the example RAP in Figure 1, in which there are 4 papers and 4 reviewers, and define the paper score for paper to be the sum of affinities of reviewers assigned to paper . In the example, each paper must be assigned 2 reviewers and each reviewer may only be assigned up to 2 papers. Even though the matchings in Figures 1(a) and 1(b) obtain equivalent objective scores under the TPMS RAP, the matching in Figure 1(a) causes papers and to have much lower paper scores than papers and . In practice, this may indicate that and have been assigned to a collection of reviewers, none of whom are well-suited to provide an expert evaluation. The assignment in Figure 1(b) is clearly more equitable with respect to the papers (and reviewers), but the TPMS RAP does not prefer this matching since it seeks to globally optimize affinity.
3.1 Local Fairness Constraints
We propose to prohibit such undesirable matchings by augmenting the TPMS RAP with local fairness constraints. That is, we constrain the paper score at each paper to be no less than [30]. Formally,
[TABLE]
We refer to the resulting RAP formulation as the local fairness formulation. While adding local fairness constraints is simple, this formulation is NP-Hard since it generalizes the max-min fair allocation problem [30]. To avoid the hardness of the local fairness formulation, one might instead be tempted to constrain the minimum affinity of valid assignments of reviewers to papers. However, doing so often results in infeasible assignment problems [31].
3.2 FairIr
We present FairIr, an approximation algorithm for solving the local fairness formulation. The algorithm is capable of accepting both lower and upper bound constraints on reviewer loads (as well as coverage constraints). By nature of being approximate, FairIr is guaranteed to return a matching in which any local fairness constraint may be violated by at most —the highest reviewer-paper affinity, and any reviewer load constraint (upper or lower bound) is violated by at most 1. Moreover, it achieves an -approximation (no violation) in the global objective. We call attention to the fact that our guarantees hold even though FairIr is able to accommodate constraints on reviewer lower bounds while optimizing a global objective, unlike most state-of-the-art paper matching algorithms with theoretical guarantees [8, 27]. Note that in practice lower bounds are often an input to the RAP in order to spread the reviewing load more equally across reviewers.
Our algorithm proceeds in rounds. In each round, FairIr relaxes the integrality constraints of the local fairness formulation (i.e., each can take any value in the range ) and solves the resulting linear program. Any with an integral assignment (i.e., either [math] or ) is constrained to retain that value in subsequent rounds. Among the s with non-integral values, FairIr looks for a paper such that at most 3 reviewers have been fractionally assigned to it (the paper may have any number of integrally assigned reviewers). If such a paper is found, FairIr drops the corresponding local fairness constraint. If no such paper is found, FairIr finds a reviewer with at most 2 papers fractionally assigned to it and drops the corresponding load constraints. The next round proceeds with the modified program. As soon as a matching is found that contains only integral assignments, that matching is returned. Algorithm 1 contains pseudocode for FairIr.
Theorem 1**.**
Given a feasible instance of the local fairness formulation , FairIr always terminates and returns an integer solution in which each local fairness constraint may be violated by at most , each load constraint may be violated by at most 1 and the global objective is maximized.
The proof of Theorem 1 is found in the appendix.
Theorem 1 requires that the instance of the local fairness formulation be feasible. A RAP instance may be infeasible if is too large, or if . Checking the second condition is trivial. To check if is too large, simply check if the corresponding relaxed local fairness formulation is infeasible. By Algorithm 1, if the relaxed program is feasible, then FairIr must return an integer solution for that instance. Formally,
Fact 1**.**
If an instance of the local fairness formulation, , is feasible after the integrality constraints on s have been removed, then Algorithm 1 returns an integral (possibly approximate) solution.
Thus, by Fact 1, testing whether or not FairIr will return an integer solution for an instance of the local fairness formulation requires solving the relaxed program. In practice, a binary search over the feasible range of is performed and the highest yielding a feasible program is selected. Such a binary search requires solving the relaxed formulation several times and can add to the computational complexity. Overall, the running time of the algorithm is dominated by the number of times the linear program solver is invoked. Note that during each iteration of FairIr, many constraints may be dropped, which helps to improve scalability without sacrificing the theoretical guarantees. Also, note that by dropping constraints during each iteration the objective score can only increase.
4 Faster Flow-based Matching
For real conferences, paper matching is an interactive process. A PC may construct one matching, and upon inspection, decide to tune the affinity matrix, and compute a new matching. Alternatively, a PC may browse a matching and decide that certain reviewers should not be assigned to certain papers, or, that certain reviewers must review certain papers. After imposing the additional constraints, ideally, a new matching could be constructed efficiently.
FairIr is founded on solving a sequence of linear programs, and thus may not be efficient enough to support this kind of interactive paper matching when the number of papers and reviewers is large. Other similar algorithms, which consider local constraints, also may not be efficient enough because they too rely on linear programming solvers [8, 27]. Therefore, we introduce a min-cost flow-based heuristic for solving the local fairness formulation that is significantly faster than other state-of-the-art approaches. While our flow-based approach does not enjoy the same performance guarantees of FairIr, empirically, we observe that it constructs high quality matches on real data (Section 5).
4.1 Paper Matching as Min-cost Flow
We begin by describing how to solve the TPMS RAP using algorithms for min-cost flow (MCF). Our first focus is on RAP instances without constraints on reviewer load lower bounds. Then we describe briefly how load lower bounds can be incorporated.
Construct the following graph, , in which each edge has both an integer cost and capacity:
create a source node with supply equal sum over papers of the corresponding coverage constraint: ; 2. 2.
create a node for each reviewer and a directed edge between the and each reviewer node with capacity and cost [math]; 3. 3.
create a node for each paper and create a directed edge from each reviewer, , to each paper with cost , where is a large positive number to ensure that the cost of each edge is integer. Each such edge has capacity ; 4. 4.
construct a sink node with demand equal to the supply at ; create a directed edge from each paper to with capacity and cost [math].
Then, solve MCF for , i.e., find the set of edges in used in sending a maximal amount of flow from to such that, for each edge , no more flow is sent across than ’s capacity, and such that the sum total cost of all utilized edges is minimal. Note that algorithms like Ford-Fulkerson can be used to solve MCF and many efficient implementations are publicly available. It can be shown that the optimal flow plan on this graph corresponds to the optimal solution for the TPMS RAP. In particular, each edge between a reviewer and paper utilized in the optimal flow plan corresponds to an assignment of a reviewer to a paper. See Figure 2(a) for a visual depiction of the .
4.2 Locally Fair Flows
We introduce a MCF-based heuristic, FairFlow, for approximately solving the local fairness formulation via a sequence of MCF problems. Our algorithm is inspired by the combinatorial approach for (approximately) solving the scheduling problem on parallel machines [7]. FairFlow is comprised of three phases that are repeated until convergence. In the first phase, a valid assignment is computed and the papers are partitioned into groups; in the second phase, specific assignments are dropped; in the third phase, the assignment computed in the first phase is refined to promote fairness.
In more detail, in phase 1 of FairFlow, is constructed using the 4 steps above (Section 4.1) and an assignment is constructed using MCF. Afterwards, the papers are partitioned into three groups as follows:
[TABLE]
In words, the first group contains all papers whose paper score is greater than or equal to ; the second group contains all papers not in but whose paper score is greater than minus the maximum score; the third group contains all other papers.
In the second phase, for each paper the reviewer assigned to that paper in phase 1 with the lowest affinity is unassigned from .
In the third phase, a refinement network, , is constructed. At a high-level, the refinement network routes flow from the papers in back through their reviewers and eventually to the papers in with the goal of reducing the number of papers with paper scores less than . The network is constructed as follows:
create a source node, , with supply equal to the minimum among the number of papers in and ; 2. 2.
create a node for each ; for each , create an edge from to with capacity and cost [math]; 3. 3.
create a node for each reviewer ; 4. 4.
for each paper , create an edge with capacity and cost [math] from to each reviewer assigned to ; 5. 5.
for each paper , create a dummy node, and construct an edge from to with capacity and cost [math]. 6. 6.
for each reviewer, assigned to a paper in , create an edge with capacity and cost [math] to each dummy paper, , if was not assigned to the paper to which is connected; 7. 7.
for each paper with dummy node , let be the current paper score at , let be the set of reviewers with edges ending at and let be the set of reviewers currently assigned to . Let be the minimum affinity among the reviewers in with respect to . For each construct an edge with capacity and cost [math] from to each if ; 8. 8.
for each reviewer, , construct an edge with capacity to each paper in if is not currently assigned to that paper. If assigning to would cause ’s group to change to , the cost of the edge is , where ; otherwise, the cost is (again, is a large constant that ensures that edge costs are integral); 9. 9.
create a sink node with demand equal to the supply at ; for each paper construct an edge from to with capacity and cost [math].
A visual illustration of the refinement network appears in Figure 2(b).
After the network is constructed, MCF in is solved. The MCF in the refinement network effectively reassigns up to 1 reviewer from each paper in to a paper in either or . Additionally, up to 1 reviewer from each paper in may be reassigned to a paper in . As before, any edge in the optimal flow plan from a reviewer to a paper (or that paper’s dummy node) corresponds to an assignment. Any edge from a paper to a reviewer corresponds to unassigning the reviewer from the corresponding paper.
Formally, we prove the following fact:
Fact 2**.**
After modifying an assignment according to the optimal flow plan in , no new papers will be added to .
The proof of Fact 2 appears in the appendix.
After solving MCF in the refinement network, some papers in and may be assigned reviewers, which violates the paper capacity constraints. To make the assignment valid, solve MCF in the original flow network (Figure 2(a)) with respect to the current assignment, the available reviewers, and the papers in violation.
FairFlow can only terminate after a valid solution has been constructed (i.e., after phase 1). The three phases are repeated until either: a) there are no papers in or b) the number of papers in remains the same after two successive iterations.
Load Lower Bounds.
Incorporating reviewer load lower bounds can be done by adding a single step to FairFlow. Specifically, in phase 1, first construct a network where the capacity on the edge from to is (rather than ). The total flow through the network is and thus all load lower bounds are satisfied. Once this initial flow plan is constructed, record the corresponding assignments and update the capacity of each edge between and to be . Similarly, update the capacity of each edge between and to be the difference between the paper’s coverage constraint and the number of reviewers assigned to in the initial flow plan. The flow plan through the updated network, combined with the initial flow plan, constitute a valid assignment. Afterwards, continue with phases 2 and 3 as normal. The additional step must be performed in each invocation of phase 1.
5 Experiments
In this section we compare 4 paper matching algorithms:
TPMS - optimal matching with respect to the TPMS RAP. 2. 2.
FairIr - our method, Algorithm 1. 3. 3.
FairFlow - our min-cost-flow-based algorithm (Section 4.2). 4. 4.
PR4A [27] - state-of-the-art flow-based paper matching algorithm that maximizes the minimum paper score. For large problems we only run 1 iteration (PR4A (i1)).
TPMS, FairIr and PR4A are implemented in Gurobi–an industrial mathematical programming toolkit [10]. FairFlow is implemented using OR-Tools111 https://developers.google.com/optimization/.
In our experiment we use data from 3 real conferences222Our data is anonymous and kindly provided by OpenReview.net and the Computer Vision Foundation.. Each dataset is comprised of: a matrix of paper-reviewer affinities (paper and reviewer identities are anonymous), a set of coverage constraints (one per paper), and a set of load upper bound constraints (one per paper). One of our datasets also includes load lower bounds. We do not evaluate PR4A on datasets when the load lower bounds are included since it was not designed for this scenario.
We report various statistics of each matching. For completeness, we also include the runtime of each algorithm. However, note that an algorithm’s runtime is significantly affected by a number of factors, including: hardware, the extent to which the algorithm has been optimized, dataset on which it is run, etc. All experiments are run on the same MacBook Pro with an Intel i7 processor.
Finding fairness thresholds.
Both FairIr and FairFlow take as input a fairness threshold, . Since the best value of this threshold is unknown in advance, we search for the best value using 10 iterations of binary search. For FairIr, at iteration with threshold , we use a linear programming solver to check whether there exists an optimal solution to the relaxation of the corresponding local fairness formulation. By Fact 1, if a solution exists, then FairIr will successfully return an integer solution. For FairFlow we do a similar binary search and return the threshold that led to the largest minimum paper score. In our implementation of FairFlow, when we test a new threshold during the binary search, we initialize from the previously computed matching. Note that PR4A does not require such a threshold as an input parameter.
Matching profile boxplots.
We visualize a matching via a set of paper score quintiles, which we call it’s profile. To construct the profile of a matching, compute the paper score of each paper and sort in non-decreasing order. The sorted list of scores is divided into 5 groups, each group containing an equal number of papers333Most datasets do not include a number of papers that is divisible by 5; in this case, the last quintile has fewer papers.. Each group of sorted paper scores is further divided into 4 even groups, and (with and containing the smallest and largest paper scores, respectively). In each profile visualization that follows, the box in each column is defined by the minimum score in , , and maximum score in , for the corresponding group (i.e, quintile). The lowest horizontal line in a column is defined by the smallest paper score that is greater than or equal to ; the highest horizontal line in the column is defined by the largest paper score that is smaller than or equal to . The rest of the points are considered outliers and denoted by red x’s. The median paper score among and is represented as an orange line. A matching’s profile provides a visual summary of the distribution of paper scores it induces, including the best and worst paper scores.
5.1 Medical Imaging and Deep Learning
In our first experiment we use data from the Medical Imaging and Deep Learning (MIDL) Conference. The data includes affinities of 177 reviewers for 118 papers. The affinities range from -1.0 to 1.0. Each paper must be reviewed by 3 reviewers and each reviewer must be assigned no more than 4 and no fewer than 2 papers (i.e., the data includes upper and lower bounds on reviewer loads).
Figure 3 displays the profiles of matchings computed by the 4 algorithms with and without lower bounds. Without lower bounds, all algorithms produce similar profiles, except that the maximum paper score achieved by PR4A and FairFlow are lowest. Somewhat similarly, these two algorithms achieve lower objective scores, which is likely a result of the fact that neither explicitly maximizes the global sum of paper scores. Interestingly, TPMS constructs a matching that is relatively fair with respect to paper scores even though it is not designed to do so.
When lower bounds are considered, the algorithms produce much different profiles. First, notice that TPMS constructs a matching in which some papers have a corresponding paper score of 0–signaling an unfair assignment. Of the fair matching algorithms, FairIr’s profile includes a higher minimum paper score, a higher maximum paper score, and a higher objective score. However, FairIr is 40% slower than FairFlow. Also note that on this small dataset, we run PR4A with no upper bound on the number of iterations (hence the long runtime). Table 1 (first block) contains matching statistics of the various algorithms for MIDL.
5.2 CVPR
Our next experiment is performed with respect to data from a previous year’s Conference on Computer Vision and Pattern Recognition (CVPR). The data includes the affinities of 1373 reviewers for 2623 papers, which amounts to a substantially larger problem than that posed by the MIDL data. All affinities are between 0.0 and 1.0. As before, each paper must be reviewed by 3 different reviewers. Each reviewer may not be assigned to more than 6 papers. Our data does not contain lower bounds. For the purpose of demonstration, we construct a set of synthetic reviewer load lower bounds where all reviewers must review at least 2 papers.
The results are contained in Figure 4 and Table 1 (second block). As before, FairFlow is the fastest fair matching algorithm, achieving 2x speedup over FairIr and an order of magnitude speedup over PR4A when lower bounds are excluded. When lower bounds are included, FairFlow is still 100s (15%) faster than FairIr. PR4A and FairIr achieve similar fairness. Interestingly, FairFlow finds the matching with highest degree of fairness when lower bounds on reviewing loads are applied. However, this comes at the expense of a relatively low objective score. FairIr constructs a more fair matching than TPMS, but not than the other two fair matching algorithms. This is unsurprising because FairIr optimizes the global objective, unlike the other algorithms, which more directly optimize fairness. FairIr’s balance between fairness and global optimality is illustrated by FairIr’s profile (Figure 4(f)), which contains a handful outliers with low scores, but many papers with comparatively high paper score in quintiles 3, 4 and 5.
5.3 CVPR2018
In our final experiment, we use data from CVPR 2018 (CVPR2018). The data contains the affinities of 2840 reviewers for 5062 papers–a substantial increase in problem size over CVPR. Affinities range between 0.0 and 11.1, with many scores closer to 0.0 (the mean score is 0.36). Each paper must be reviewed 3 times. Reviewer load upper bounds vary by reviewer and range between 2.0 and 9.0. Again, the data does not include load lower bounds and so we construct synthetic lower bounds of 2.0 for all reviewers. Because of the size of the problem, the binary search for a suitable value of did not terminate within 5 hours. Therefore, we select by summing the minimum paper score found by FairFlow and . The reported run time includes the run time of FairFlow.
Table 1 (third vertical block) reveals similar trends with respect to speed (FairFlow is most efficient) and fairness (PR4A and FairIr are the most fair). Figure 5 displays the corresponding matching profiles.
6 Related Work
Our work is most similar to previous studies that develop algorithms for constructing fair assignments for the RAP. Two studies propose to optimize for fairness with respect to the least satisfied reviewer, which can be formulated as a maximization over the minimum paper score with respect to an assignment [8, 27]. The first algorithm, to which we compare, is PR4A [27]. PR4A iteratively solves maximum-flow through a sequence of specially constructed networks, like our FairFlow, and is guaranteed to return a solution that is within a bounded multiplicative constant of the optimal solution with respect to their maximin objective. As demonstrated in experiments, FairFlow is faster than PR4A and achieves similar quality solutions on data from real conferences. We note that the work introducing PR4A also presents a statistical study of the acceptance of the best papers among a batch submitted; we do not focus on paper acceptance in this work.
The second work proposes a rounding algorithm and prove an additive, constant factor approximation of the optimal assignment, like we do [8]. We note that both their algorithm and proof techniques are different from ours. However, their algorithm requires solving a new linear program for each reviewer during each iteration, which is unlikely to scale to large problems. Moreover, PR4A directly compares favorably to this algorithm [27].
With respect to fairness, the creators of TPMS perform experiments that enforce load equity among reviewers (i.e., each reviewer should be assigned a similar number of papers) via adding penalty terms to the objective [4]. These researcher, and others, explore formulations that maximize the minimum affinity among all assigned reviewers, which is different from our fairness constraint [23, 31]. Others have posed instances of the RAP that require at least one reviewer assigned to each paper to have an affinity greater than . In this setting, one classic piece gives an algorithm for constructing assignments that maximizes by modeling the RAP as a transshipment problem [11]. Other objectives have been considered for the RAP, but these tend to be global optimizations with no local constraints that can lead to certain papers being assigned groups of inappropriate reviewers [9, 31, 18].
Some previous work on the RAP models each paper as a binary set of topics and each reviewer as a binary set of expertises (the overall sets of topics and expertises are the same). In this setting the goal to maximize coverage of each paper’s topics by the assigned reviewers’ expertises [21, 14, 20]. A generalized settings allows paper and reviewer representations to be real-valued vectors rather than binary [28, 16]. The resulting optimization problems are solved via ILPs, constraint based optimization or greedy algorithms. While representing papers and reviewers as topic vectors allows for more fine-grained characterization of affinity, in practice, reviewer-paper affinity is typically represented by a single real-value–like the real-conference data we use in experiments.
A significant portion of the work related to the RAP explores methods for modeling reviewer-paper affinities. Some of the earliest work employs latent semantic indexing with respect to the abstracts of submitted and previously published papers [6]. More recent work models each author as a mixture of personas and each persona as a mixture of topics; each paper written by an author is generated from a combination of personas [22]. Other approaches use reviewer bids to derive the affinity between papers and reviewers. Since reviewers normally do not bid on all papers, collaborative filtering has been used for bid imputation [5]. Finally, some approaches model affinity using proximity in coauthorship networks, citations counts, and the venues in which a paper is published [25, 17, 19].
7 Conclusion
This work introduces the local fairness formulation of the reviewer assignment problem (RAP) that includes a global objective as well as local fairness constraints. Since it is NP-Hard, we present two algorithms for solving this formulation. The first algorithm, FairIr, relaxes the formulation and employs a specific rounding technique to construct a valid matching. Theoretically, we show that FairIr violates fairness constraints by no more than the maximum reviewer-paper affinity, and may only violate load constraints by 1. The second algorithm, FairFlow, is a more efficient heuristic that operates by solving a sequence of min-cost flow problems. We compare our two algorithms to standard matching techniques that do not consider fairness, and a state-of-the-art algorithm that directly optimizes for fairness. On 3 datasets from recent conferences, we show that FairIr is best at jointly optimizing the global matching while statisfying fairness constraints, and FairFlow is the most efficient of the fairness matching algorithms. Despite a lack of theoretical guarantees, FairFlow constructs highly fair matchings.
All code for experiments is available here: https://github.com/iesl/fair-matching.
Anonymized data is either included in the repository or available upon request from the first author.
8 Acknowledgments
This material is based upon work supported in part by the Center for Data Science and the Center for Intelligent Information Retrieval, and in part by the Chan Zuckerberg Initiative under the project "Scientific Knowledge Base Construction." B. Saha was supported in part by an NSF CAREER award (no. 1652303), in part by an NSF CRII award (no. 1464310), in part by an Alfred P. Sloan Fellowship, and in part by a Google Faculty Award. Opinions, findings and conclusions/recommendations expressed in this material are those of the authors and do not necessarily reflect those of the sponsors.
Appendix A FairIr Guarantees
We restate and then prove Theorem 1.
Theorem**.**
Given a feasible instance of the local fairness formulation , FairIr returns an integer solution in which each local fairness constraint may be violated by at most , each load constraint may be violated by at most 1 and the global objective is maximized.
The local fairness formulation, , is comprised of a set of reviewers, , a set of papers, , reviewer load lower and upper bounds, and , respectively, coverage constraints, , a paper-reviewer affinity matrix, , and a local fairness threshold, . To prove this theorem we rely on three lemmas. The first guarantees that FairIr does not violate a load constraint by more than 1; the second guarantees that FairIr will never violate a local fairness constraint by more than ; the third guarantees that FairIr will always terminate if the input problem is feasible.
Lemma 1**.**
Given a feasible instance of the local fairness formulation, FairIr never violates a load constraint by more than 1.
Proof.
FairIr only drops load constraints if a reviewer is assigned fractionally to at most 2 papers. Clearly, if a reviewer is assigned to exactly one paper, the load constraint can be violated by at most one. Therefore, let be a reviewer, assigned fractionally to and only. Then,
[TABLE]
where is the total load on excluding and . Since is only fractionally assigned to 2 papers, must be integer; since , . Thus,
[TABLE]
If the load constraints are dropped and is neither assigned to nor , then will retain a load of , which is at least as large as 1 less than . On the other hand, if is assigned to both and , then will exhibit a load of . ∎
Lemma 2**.**
Given a feasible instance of the local fairness formulation, FairIr never violates a local fairness constraint by more than .
Proof.
FairIr only drops a paper’s local fairness constraint if that paper has at most 3 reviewers fractionally assigned to it. Clearly, if a paper has only one reviewer fractionally assigned to it, the local fairness constraint can be violated by at most . Assume during an iteration of FairIr a paper has exactly 2 reviewers fractionally assigned to it. Call that paper and those reviewers and . During each iteration of FairIr, a feasible solution to the relaxed local fairness formulation is computed. Therefore,
[TABLE]
where is load the on aside from the load contributed by reviewers and . Recall that and and are the only reviewers fractionally assigned to . Therefore . Moreover,
[TABLE]
Now, consider the paper score at , and let be the total affinity between and all its assigned reviewers, except for and . Then,
[TABLE]
Since either or must be assigned integrally to (lest the coverage constraint be violated), dropping the local fairness constraint at can only lead to a violation of the local fairness constraint at by at most .
Next, consider the case that has 3 reviewers fractionally assigned to it, , and . Since the coverage constraint at must be met with equality, one of the two cases below must be true:
[TABLE]
or
[TABLE]
As before, let be the paper score at , excluding affinity contributed from fractionally assigned reviewers. If the first case above is true, then . Furthermore,
[TABLE]
This means that even if all three reviewers were unassigned from (which would make satisfying the coverage constraint at impossible), the local fairness constraint would only be violated by at most . Now, consider case 2 above, where . In order to satisfy the coverage constraint at , at least two of the three reviewers must be assigned integrally to . Without loss of generality, assume that
[TABLE]
Even if is unassigned from , the change in paper score at is at most and the local fairness can be violated at most by . The same is also true if either or is unassigned from . ∎
Lemma 3**.**
Given a feasible instance of the local fairness fromulation, FairIr always terminates.
The goal in proving Lemma 3 is to show that during each iteration of FairIr, either: a constraint is dropped or an integral solution is found. Before proving Lemma 3 recall that the solution, , of a linear program is always a basic feasible solution, i.e., it has linearly independent tight constraints. Formally,
Corollary 2**.**
If is a basic feasible solution of linear program , then the number of non-zero variables in cannot be greater than the number of linearly independent active constraints in .
Proof.
According to Algorithm1, FairIr drops constraints during any iteration in which it constructs a solution exhibiting at least one paper with at most 3 reviewers fractionally assigned to it or at least one reviewer assigned fractionally to at most 2 papers. If FairIr is able to drop a constraint or round a new variable to integral, it makes progress. Therefore, FairIr could only fail to make progress if each reviewer was assigned fractionally to at least 3 papers and each paper was assigned fractionally to at least 4 reviewers. In the following, we show that this is impossible, using a particular invocation of Corollary 2.
Assume for now that each reviewer is fractionally assigned to exactly 3 papers and each paper is assigned fractionally to exactly 4 reviewers. Therefore, the total number of fractional assignments can be written as follows:
[TABLE]
An instance of the local fairness paper matching problem contains an upper and lower bound constraint for each reviewer, 1 coverage constraint for each paper, and 1 local fairness constraint for each paper yielding total constraints. Note that for a reviewer , only one of its load constraints (i.e., upper or lower) may be tight–assuming that the upper and lower bounds are distinct. Thus, an upper bound on the number of active constraints is . However, this means that the number of fractional variables is larger than the number of constraints:
[TABLE]
which violates Corollary 2. When reviewers may be fractionally assigned to at least 3 papers and each paper is assigned fractionally to at least 4 reviewers, the number of nonzero fractional variables could only be larger. Note that, when there is no local fairness constraint FairIr returns an integral solution since the underlying constraint matrix becomes totally unimodular.
∎
Now to end the proof of the theorem, we note that the global objective value never decreases in subsequent rounds, as we always relax the formulation by dropping constraints and fix those integrality constraints for which s have been returned as integer. Thus, FairIr maximizes the global objective.
Appendix B Proof of Fact 2
Proof.
By definition, papers that are members of have paper score greater than . Therefore, unassigning a reviewer from a paper in may reduce the corresponding paper score by at most yielding a paper score of at least , which makes the paper either a member of or . Now, consider the papers in . By step 7 above, a reviewer can only be unassigned from a paper if the flow entering from is large enough to make ’s resulting paper score at least as large as . Thus, the papers in either remain in or become members of , which completes the proof. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1]
- 2Camerer and Johnson [1997] C. F. Camerer and E. J. Johnson. 1997. 10 The process-performance paradox in expert judgment: How can experts know so much and predict so badly? Research on judgment and decision making: Currents, connections, and controversies (1997).
- 3Charlin and Zemel [2013] L. Charlin and R. S. Zemel. 2013. The Toronto paper matching system: an automated paper-reviewer assignment system. In ICML .
- 4Charlin et al . [2012] Laurent Charlin, Richard S Zemel, and Craig Boutilier. 2012. A framework for optimizing paper matching. ar Xiv:1202.3706 (2012).
- 5Conry et al . [2009] D. Conry, Y. Koren, and N. Ramakrishnan. 2009. Recommender systems for the conference paper assignment problem. In conference on Recommender systems .
- 6Dumais and Nielsen [1992] S. T. Dumais and J. Nielsen. 1992. Automating the assignment of submitted manuscripts to reviewers. In Research and development in information retrieval .
- 7Gairing et al . [2007] M. Gairing, B. Monien, and A. Woclaw. 2007. A faster combinatorial approximation algorithm for scheduling unrelated parallel machines. Theoretical Computer Science (2007).
- 8Garg et al . [2010] N. Garg, T. Kavitha, A. Kumar, K. Mehlhorn, and J. Mestre. 2010. Assigning papers to referees. Algorithmica (2010).
