Efficient Performance Tracking: Leveraging Large Language Models for Automated Construction of Scientific Leaderboards
Furkan \c{S}ahinu\c{c}, Thy Thy Tran, Yulia Grishina, Yufang Hou, Bei, Chen, Iryna Gurevych

TL;DR
This paper introduces SciLead, a curated dataset for scientific leaderboards, and proposes an LLM-based framework to automate leaderboard construction across different real-world scenarios, addressing issues of incompleteness and inaccuracies in existing datasets.
Contribution
The work presents a new curated dataset and a comprehensive LLM-based framework for automated leaderboard construction in diverse scenarios, improving over previous methods.
Findings
LLMs can accurately identify TDM triples in publications.
LLMs often struggle to extract precise result values from papers.
The proposed framework effectively addresses incomplete and incorrect leaderboard data.
Abstract
Scientific leaderboards are standardized ranking systems that facilitate evaluating and comparing competitive methods. Typically, a leaderboard is defined by a task, dataset, and evaluation metric (TDM) triple, allowing objective performance assessment and fostering innovation through benchmarking. However, the exponential increase in publications has made it infeasible to construct and maintain these leaderboards manually. Automatic leaderboard construction has emerged as a solution to reduce manual labor. Existing datasets for this task are based on the community-contributed leaderboards without additional curation. Our analysis shows that a large portion of these leaderboards are incomplete, and some of them contain incorrect information. In this work, we present SciLead, a manually-curated Scientific Leaderboard dataset that overcomes the aforementioned problems. Building on this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Online Learning and Analytics
