Vote'n'Rank: Revision of Benchmarking with Social Choice Theory
Mark Rofin, Vladislav Mikhailov, Mikhail Florinskiy, Andrey, Kravchenko, Elena Tutubalina, Tatiana Shavrina, Daniel Karabekyan, Ekaterina, Artemova

TL;DR
This paper introduces Vote'n'Rank, a social choice theory-based framework for more robustly ranking machine learning systems in multi-task benchmarks, addressing limitations of traditional averaging methods.
Contribution
It proposes a novel ranking method based on social choice theory, improving robustness and interpretability in multi-task benchmark evaluations.
Findings
Vote'n'Rank provides more reliable system rankings.
The framework handles missing data effectively.
It offers insights into system performance across ML sub-fields.
Abstract
The development of state-of-the-art systems in different applied areas of machine learning (ML) is driven by benchmarks, which have shaped the paradigm of evaluating generalisation capabilities from multiple perspectives. Although the paradigm is shifting towards more fine-grained evaluation across diverse tasks, the delicate question of how to aggregate the performances has received particular interest in the community. In general, benchmarks follow the unspoken utilitarian principles, where the systems are ranked based on their mean average score over task-specific metrics. Such aggregation procedure has been viewed as a sub-optimal evaluation protocol, which may have created the illusion of progress. This paper proposes Vote'n'Rank, a framework for ranking systems in multi-task benchmarks under the principles of the social choice theory. We demonstrate that our approach can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multi-Criteria Decision Making · Mobile Crowdsensing and Crowdsourcing
MethodsLinear Layer · LAMB · ALBERT · How do I file a dispute with Expedia?*DisputeFastService · DeBERTa · ERNIE · Residual Connection · Weight Decay · Attention Dropout · Linear Warmup With Linear Decay
