Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks
Guanhua Zhang, Moritz Hardt

TL;DR
This paper explores the inherent trade-offs between diversity and stability in multi-task benchmarks in machine learning, revealing that increased diversity leads to greater sensitivity to irrelevant changes, with implications for benchmark design.
Contribution
It introduces a novel theoretical framework based on social choice theory, along with new quantitative measures and algorithms, to analyze and empirically demonstrate the diversity-stability trade-off in multi-task benchmarks.
Findings
A strong inverse relationship between diversity and sensitivity in benchmarks.
Existing benchmarks are highly unstable under irrelevant task changes.
New measures and algorithms for assessing diversity and sensitivity.
Abstract
We examine multi-task benchmarks in machine learning through the lens of social choice theory. We draw an analogy between benchmarks and electoral systems, where models are candidates and tasks are voters. This suggests a distinction between cardinal and ordinal benchmark systems. The former aggregate numerical scores into one model ranking; the latter aggregate rankings for each task. We apply Arrow's impossibility theorem to ordinal benchmarks to highlight the inherent limitations of ordinal systems, particularly their sensitivity to the inclusion of irrelevant models. Inspired by Arrow's theorem, we empirically demonstrate a strong trade-off between diversity and sensitivity to irrelevant changes in existing multi-task benchmarks. Our result is based on new quantitative measures of diversity and sensitivity that we introduce. Sensitivity quantifies the impact that irrelevant changes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
