TL;DR
This paper reviews the state of Automatic Leaderboard Generation in machine learning, proposing a unified framework, benchmarking guidelines, and discussing future challenges to improve automated leaderboard curation.
Contribution
It provides the first comprehensive overview of ALG research, introduces a standardized conceptual framework, and offers benchmarking guidelines to advance the field.
Findings
Identified fundamental differences in ALG approaches
Proposed a unified conceptual framework for ALG
Suggested benchmarking guidelines for fair evaluation
Abstract
An important task in machine learning (ML) research is comparing prior work, which is often performed via ML leaderboards: a tabular overview of experiments with comparable conditions (e.g., same task, dataset, and metric). However, the growing volume of literature creates challenges in creating and maintaining these leaderboards. To ease this burden, researchers have developed methods to extract leaderboard entries from research papers for automated leaderboard curation. Yet, prior work varies in problem framing, complicating comparisons and limiting real-world applicability. In this position paper, we present the first overview of Automatic Leaderboard Generation (ALG) research, identifying fundamental differences in assumptions, scope, and output formats. We propose an ALG unified conceptual framework to standardise how the ALG task is defined. We offer ALG benchmarking guidelines,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
