Multiclass MinMax Rank Aggregation
Pan Li, Olgica Milenkovic

TL;DR
This paper introduces new minmax rank aggregation problems using Kendall tau and Spearman footrule distances, providing approximation algorithms and demonstrating their applications on Mallows model and genomic data.
Contribution
It presents the first constant-approximation algorithms for NP-hard minmax rank aggregation problems under two distance measures.
Findings
Algorithms achieve constant approximation ratios.
Applications demonstrate effectiveness on real data.
Framework applicable to various ranking scenarios.
Abstract
We introduce a new family of minmax rank aggregation problems under two distance measures, the Kendall {\tau} and the Spearman footrule. As the problems are NP-hard, we proceed to describe a number of constant-approximation algorithms for solving them. We conclude with illustrative applications of the aggregation methods on the Mallows model and genomic data.
| mmKT-Conv |
| 1: Choose the pivot according to |
| 2: Set . |
| 3: For all : |
| If , . Otherwise, . |
| 4: Return [mmKT-Conv, , mmKT-Conv ]. |
| min-Pick-Perm , . |
|---|
| 1: For each and each ranking |
| 2: Compute Score |
| 3: Let Score. Output . |
| A. | ||||
|---|---|---|---|---|
| 0.5 | 0.7 | 0.9 | 1.0 | |
| mmKT-Conv | 14.5 (1.1) | 16.3 (1.4) | 17.8 (1.3) | 17.9 (1.5) |
| Pick-Rnd-Perm | 17.8 (1.4) | 19.9 (2.1) | 21.5 (1.8) | 21.6 (2.1) |
| Pick-Opt-Perm | 15.9 (1.8) | 18.1 (1.8) | 20.0 (1.7) | 20.0 (1.6) |
| FASLP-Pivot | 15.3 (1.4) | 17.7 (2.1) | 19.4 (2.2) | 19.7 (2.3) |
| Aggregated Sequences | ||
|---|---|---|
| mmKT-Conv | 210 | 1 10 7 2 17 12 30 9 11 23 19 20 21 |
| 13 35 3 15 14 25 26 6 16 32 28 34 | ||
| 4 24 27 18 36 29 31 8 33 22 5 | ||
| Pick-Opt-Perm | 267 | 1 27 2 17 36 20 3 29 10 11 35 12 30 |
| 21 9 19 18 28 33 7 8 16 26 14 34 13 | ||
| 24 15 32 25 4 22 23 6 31 5 | ||
| FASLP-Pivot | 269 | 1 2 17 7 23 12 3 20 30 21 6 9 10 |
| 11 15 19 28 25 27 18 32 8 33 24 13 34 | ||
| 14 4 35 29 26 16 36 31 22 5 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Game Theory and Voting Systems · Data Management and Algorithms
Multiclass MinMax Rank Aggregation
Pan Li and Olgica Milenkovic
ECE Department, University of Illinois at Urbana-Champaign
Email: [email protected], [email protected]
Abstract
We introduce a new family of minmax rank aggregation problems under two distance measures, the Kendall and the Spearman footrule. As the problems are NP-hard, we proceed to describe a number of constant-approximation algorithms for solving them. We conclude with illustrative applications of the aggregation methods on the Mallows model and genomic data.
I Introduction
Rankings, a special form of ordinal data, have received significant attention in the machine learning community as they arise in a number of important application domains, such as recommender systems, social voting and product placement platforms. Of particular importance are rankings of the form of linear orders (permutations) and partial rankings (weak orders), which are frequently obtained through conversion from ratings. One of the main processing tasks for rankings is rank aggregation, which often involves evaluating the median of a set of permutations or partial rankings under a suitably chosen distance function [2, 4, 7, 9, 11, 12, 16]. The median rank aggregation problem under the Kendall distance was introduced by Kemeny [11], and was proved to be NP-hard by Bartholdi et al. [4]. A number of approximation algorithms for the problem have been described in [2], mostly pertaining to permutations; a corresponding PTAS (polynomial time approximation scheme) was proposed in [12]. In the context of partial ranking aggregation, known solutions include the results of [1, 10]. Median aggregation under other distance functions has received less attention, one notable exception being the Spearman rank aggregation problem [7], which is known to provide a constant approximation for Kendall aggregation using a polynomial time algorithm based on weighted bipartite matching [9].
We propose to investigate a broad new family of rank aggregation problems in which the median is replaced by a minmax type of function and where the rankings are grouped in classes. More precisely, assume that there are different classes of rankings and let be the set of rankings belonging to the class labeled by . Our minmax rank aggregation problem may be succinctly described as follows: Output a ranking that agrees in the minmax sense with the rankings belonging to the different classes. Rigorously, we seek to solve the following optimization problem:
[TABLE]
where represent the costs of violating the agreement with rankings in class . In the above formulation, stands for a distance between a ranking or partial ranking and a set of rankings , and it may be chosen to be of the form of a median distance (which equals the total sum of distances between and the elements of ) or a minimum distance (which equals the smallest distance between and an element in ). The above described MinMax problem is motivated by a number of applications in which classes of rankings arise due to different ranking criteria or properties of the ranking entities (social platforms) or due to prior knowledge of different similarity degrees in groups of rankings (genome evolution). The minmax criteria is typically used when trying to ensure that the aggregate violates each vote (class of votes) to roughly the same extent.
We start our analysis with the MinMax problem with and under the median and minimum distance, and then proceed to study the problem for the case of arbitrary values of and , . For both the case of the Kendall as well as the Spearman footrule in the median and minimum distance setting, the MinMax problems may be shown to be NP-hard by using the corresponding results of [3]. In particular, the work in [3] outlines a general framework for proving NP-hardness results for the median, single class min-max-aggregation problem under different ranking distances. Nevertheless, only a handful of approximation algorithms were proposed even for this basic min-max-aggregation form: To the best of our knowledge, the only provable algorithm for the single class MinMax under the minimum distance measure was provided in [3]. The algorithm takes the form of the well studied ”pick-a-permutation” method, and tends to perform poorly in practice.
The main results of our work include families of constant approximation algorithm for the new, general family of multiclass MinMax problems, both under the median and minimum class distance, evaluated using the Kendall and Spearman footrule. Furthermore, we illustrate the use of the new aggregation paradigm on the problem of finding an ancestral genome arrangement for mitochondrial DNA under the tandem duplication model for genomes [6].
II Mathematical Preliminaries
Let denote a set of elements, which without loss of generality we set to . A ranking is an ordering of a subset of elements of according to a predefined rule. When , the resulting order is referred to as a permutation. When the rankings include ties, they are referred to as partial rankings [10].
More precisely, a permutation is a bijection , and the set of permutations over forms the symmetric group of order , denoted by . For any and , denotes the rank (position) of the element in . We say that is ranked higher than (ranked lower than ) iff (). The inverse of a permutation is denoted by . Clearly, represents the element ranked at position in . Similarly, partial rankings [10] represent a mapping over in which there may exist two elements such that . It is common to use to denote the position of the element in the partial ranking , and to define it as
[TABLE]
A number of distance functions between rankings were proposed in the literature [7, 10, 14]. One distance function counts the number of adjacent transpositions needed to convert a permutation into another. Adjacent transpositions generate , i.e., any permutation can be converted into another permutation through a sequence of adjacent transpositions [14]. The smallest number of adjacent transpositions needed to convert a permutation into another permutation is termed the Kendall distance, denoted by . The Kendall distance between two permutations and over also equals the number of pairwise inversions of elements of the two permutations:
[TABLE]
Another positional distance measure is the Spearman footrule,
[TABLE]
It can be shown that [7].
One may similarly define a generalization of the Kendall distance for partial rankings and over the set . This distance is known as the Kemeny distance, and equals
[TABLE]
The Spearman footrule analogue for partial rankings [10] equals the sum of the absolute differences between “positions” of elements in the partial rankings,
[TABLE]
where positions are as defined in (II). The Spearman footrule distance for partial rankings is a -approximation for the Kemeny distance [10].
The notion of a distance between two rankings has an important extension in terms of a distance between a ranking and a set of rankings, which we refer to as rank-set distances. We focus our attention on two types of rank-set distances, defined below. For compactness, we use to denote an arbitrary distance on pairs of rankings, but focus our attention throughout the paper on .
Definition II.1**.**
Suppose that is a ranking and that is a set of rankings. Given a distance between two rankings , the median- distance () between and equals
[TABLE]
Definition II.2**.**
Suppose that is a ranking and that is a set of rankings. Given a distance between two rankings , the min- distance () between and is defined as
[TABLE]
We recall that the focal problem of this work is to find constant approximation algorithms for the MinMax rank aggregation problem, which reads as
[TABLE]
where is a or distance, with . In our future analysis we use and . Furthermore, we let denote the argument of the optimal solution of the MinMax problem and let .
III Approximate MinMax Aggregation
As previously pointed out, the MinMax problem under both the and can be shown to be NP-hard using the results of [3], which established hardness for the special case and a pseudometric. We hence focus on devising approximation algorithms for the MinMax problem.
III-A Permutations
We first consider ordinal data of the form of permutations. We show that a simple algorithm, which we term Pick-Rnd-Perm, can achieve a -approximation in expectation for the case of the problem whenever is a pseudometric. Then, for , we describe two -approximation algorithms that use a combination of convex optimization and rounding procedures and offer significantly better empirical performance than random selection. Finally, we describe a -approximation algorithm for the problems when is a pseudometric. The selection algorithm essentially transforms the problem into a problem: Thus, the algorithms developed for approximating multiclass problems may be used to approximate corresponding instances of the problem.
The Pick-Rnd-Perm Algorithm. Pick a permutation from uniformly at random.
Theorem III.1**.**
For the distance, where is a pseudometric, the Pick-Rnd-Perm algorithm produces a -approximation of the problem.
Proof.
For a given ,
[TABLE]
By calculating the expectation, we obtain
[TABLE]
Clearly, random selection may be improved by picking the optimal permutation from instead. We term this approach Pick-Opt-Perm. Although the Pick-Rnd (Opt) -Perm algorithms are exceptionally simple and offer a -approximation to the optimal solution, they have a number of drawbacks, including the fact that the aggregate is a given ranking from the clusters, which violates fairness rules of aggregates, and that its empirical performance is typically very poor. To mitigate these problems, we propose more sophisticated aggregation algorithms for both the and problems.
Case I: . For , a well known method termed random pivoting proposed by Ailon et al. [1, 2] offers a -approximation in expectation for both the permutation and partial rank aggregation problem. In random pivoting, at each step, one element in the ranking is chosen uniformly at random and the remaining elements are partitioned based on the pairwise comparison with the pivot element. However, for the case of the MinMax problem with , random pivoting may be inadequate: The difficulty lies in the fact that rankings in different classes may lead to widely disparate pairwise pivot comparisons. Another problem in this context is that while one may achieve a constant approximation in expectation for each class individually, the largest cost among classes may not be bounded due to the exchange of the expectation and maximization operators. Therefore, instead of pivoting, one must resort to a different approach to the problem. Our approach is to deterministically round the fractional solution of a specific convex optimization problem. The deterministic rounding procedure is motivated by ideas in [16].
Let , where stands for the indicator function, and let for all . For a given ranking , also define the variables . The MinMax problem may be stated as
[TABLE]
Note that if the rankings are permutations, then which is a value that only depends on .
The above integer program may be relaxed to a linear program by allowing to take fractional values. Upon solving the linear program, one needs to round the values of . The next rounding procedure guarantees a -approximation.
Let if and if . Let be a pivoting element for the rounding procedure and use to denote the set of pairs of elements (excluding ) whose positions are determined by pivoting on . Define
[TABLE]
The rounding procedure makes iterative calls to the the following routine.
Theorem III.2**.**
The iterative application of the mmKT-Conv algorithm outputs a permutation with at most twice the cost of the optimal solution of the linear program (4).
At each iteration of rounding, denotes the cost of rounding incurred by the class of rankings, while denotes the associated cost of the linear program for class . Hence, the goal is to prove that for the given choice of the pivot , we have for all . Suppose that is the index of the class that maximizes at the first step of mmKT-Conv. Then, it suffices to show that . This result is a corollary of the following lemma.
Lemma III.3**.**
, .
Proof.
To prove the claimed result, it suffices to prove that for any two distinct elements , one has
[TABLE]
and for any triple of distinct elements , one has
[TABLE]
where the summation is circular over all permutations of . Both summations are taken over all possible permutations of the two (three) elements in the argument.
The inequality (5) is easy to prove: Suppose that . Then the sum on the left hand side equals which is bounded by the right hand side expression. To prove the inequality (6), consider the six variables associated with , namely . These variables may be partitioned into two classes, and . There are at least three variables that are 0’s. Without loss of generality, suppose that the class contains at least two 0’s.
Case 1: Assume that . Then, the difference of the left and right hand side of the inequality under consideration equals
[TABLE]
[TABLE]
The claimed result then follows from observing that .
Case 2: Assume that . The left hand side equals which is clearly bounded from above by the right hand side expression as .
Case II: . When , the MinMax aggregation problem may be solved in polynomial time via weighted bipartite matching [9]. However, when , the problem is hard even if for all [3].
Step 1: If we remove the integral constraint on the position of elements in , the optimization problem of interest is convex and may be solved efficiently:
[TABLE]
where .
Step 2 (mmSP-Conv): We assign positions to elements according to the fractional solution as follows. If , we let for any two distinct elements with ties broken randomly.
Theorem III.4**.**
mmSP-Conv rounding increases the cost of the convex optimization problem (7) at most twice.
Proof.
First, we claim that the output of mmSP-Conv, denoted by , is in . This follows since for any ranking , if two elements satisfy and , we may transpose and in to obtain a smaller . Second, for an arbitrary permutation , we have
[TABLE]
The claim follows by setting .
Note that the integrality gap of the problems (4) (7) is , as one may consider two equally weighted classes, each of which contains one single ranking, and , respectively. Hence, the best approximation constant via the use of cannot be less than , which implies that the proposed rounding is optimal. One may expect to achieve a smaller approximation constant by outputting the better of the two results produced by Pick-Rnd-Perm and mmKT(SP)-Conv. This approach will be discussed in the full version of the paper.
We introduce next the min-Pick-Perm algorithm for solving the problem.
Theorem III.5**.**
If is pseudometric, then min-Pick-Perm is a -approximation algorithm for the problems.
Proof.
By the definition of the problem, each class contains at least one permutation, which without loss of generality we denote by , that satisfies . As is pseudometric, we have
[TABLE]
Next, choose an arbitrary and let . Then,
[TABLE]
Moreover, the output of min-Pick-Perm satisfies
[TABLE]
The result follows by combining the above inequalities.
Remark III.1*.*
Let be the optimal indices generated by min-Pick-Perm. Define and let
[TABLE]
for . A approximate solution for the problem with input , denoted by , satisfies
[TABLE]
Hence, is a approximation for the original problem. Therefore, convex optimization and rounding can be used on the problem. We refer to these adapted algorithms as min-mmKT-Conv and min-mmSP-Conv.
III-B Partial rankings
All the algorithms proposed for permutation aggregation generalize to partial ranking aggregation. One may easily show that as long as the distance defined for partial rankings is a pseudometric (e.g., ), the -approximation guarantees for all previous methods hold. To get a fractional solution in the program of mmKT-Conv, we have to change the constraint (4) to
[TABLE]
[TABLE]
which does not depend on the type of output ranking. Also, note that for partial rankings does not satisfy the equality , although the triangle inequality still holds. As the proof of Theorem III.2 only requires the later inequality, the same rounding procedure offers a -approximation. Also, in the optimization problem (7) one has to use the definition for partial rankings.
IV Simulations
We compare the performance of three families of algorithms: Convex optimization procedures with rounding (mmKT-Conv, mmSP-Conv, min-mmKT-Conv, min-mmSP-Conv), permutation selection (Pick-Rnd-Perm, Pick-Opt-Perm, min-Pick-Perm) and algorithms used for traditional min-median rank aggregation (FASLP-Pivot [2] and SP-Matching [9]). The comparison shows that algorithms based on convex optimization yield significantly better results than naive selection methods, and that traditional aggregation algorithms are poor candidates for solving MinMax problems.
First, we evaluate the proposed algorithms on synthetic data. The synthetic data is generated based on what we call a two-level Mallows model: First, we generate the permutations independently based on the Mallows distribution [13]. Then, for each class , we generate permutations independently according to the Mallows distribution . We set the number of classes to , fix and let each class contain permutations. To control the distance between different classes, we choose from . The objective function values for independent samples, obtained by different algorithms, are shown Table I.
Our next test example comes from evolutionary biology, and is concerned with Mitochondrial DNA (mtDNA) genome aggregation. The aggregate in this case corresponds to an ancestral genome. The most common used rearrangement distance between two nuclear genomes is based on reversals [15], but mitochondrial DNA rearrangement studies have also involved the Kendall distance [6]. In the latter case, the authors only considered the median problem , although the min-max problem is equally relevant [8, 3]. In our experiment, we used the mtDNA dataset from [5]. The dataset contains metazoan genomes with gene-blocks in some arrangement. We removed the “signs” of gene orders and let each genome represent one class, so that and for all ; we fixed . Table II shows the results. Due to page limitations, we relegate the significantly more space consuming empirical study of weighted multiclass mtDNA aggregation to the extended version of the paper.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Nir Ailon. Aggregation of partial rankings, p-ratings and top-m lists. Algorithmica , 57(2):284–300, 2010.
- 2[2] Nir Ailon, Moses Charikar, and Alantha Newman. Aggregating inconsistent information: ranking and clustering. Journal of the ACM (JACM) , 55(5):23, 2008.
- 3[3] Christian Bachmaier, Franz J Brandenburg, Andreas Gleißner, and Andreas Hofmeier. On the hardness of maximum rank aggregation problems. Journal of Discrete Algorithms , 31:2–13, 2015.
- 4[4] John Bartholdi III, Craig A Tovey, and Michael A Trick. Voting schemes for which it can be difficult to tell who won the election. Social Choice and welfare , 6(2):157–165, 1989.
- 5[5] Guillaume Bourque and Pavel A Pevzner. Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome research , 12(1):26–36, 2002.
- 6[6] Kamalika Chaudhuri, Kevin Chen, Radu Mihaescu, and Satish Rao. On the tandem duplication-random loss model of genome rearrangement. In Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm , pages 564–570. Society for Industrial and Applied Mathematics, 2006.
- 7[7] Persi Diaconis and Ronald L Graham. Spearman’s footrule as a measure of disarray. Journal of the Royal Statistical Society. Series B (Methodological) , pages 262–268, 1977.
- 8[8] Liviu P Dinu and Radu Ionescu. An efficient rank based approach for closest string and closest substring. P Lo S One , 7(6):e 37576, 2012.
