On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao, Abdul Ali Bangash, Filipe Roseiro C\^ogo, Bram Adams,, Ahmed E. Hassan

TL;DR
This study explores the workflows and common issues in foundation model leaderboards, proposing improvements to enhance transparency, accountability, and collaboration in model evaluation practices.
Contribution
It systematically analyzes real-world leaderboard operations, identifies workflow patterns, and uncovers eight types of leaderboard smells to improve FM evaluation transparency.
Findings
Identified five distinct leaderboard workflow patterns.
Developed a domain model capturing key components of LBOps.
Discovered eight types of leaderboard smells affecting transparency.
Abstract
Foundation models (FM), such as large language models (LLMs), which are large-scale machine learning (ML) models, have demonstrated remarkable adaptability in various downstream software engineering (SE) tasks, such as code completion, code understanding, and software development. As a result, FM leaderboards have become essential tools for SE teams to compare and select the best third-party FMs for their specific products and purposes. However, the lack of standardized guidelines for FM evaluation and comparison threatens the transparency of FM leaderboards and limits stakeholders' ability to perform effective FM selection. As a first step towards addressing this challenge, our research focuses on understanding how these FM leaderboards operate in real-world scenarios ("leaderboard operations") and identifying potential pitfalls and areas for improvement ("leaderboard smells"). In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Resource Development and Performance Evaluation · Organizational Learning and Leadership
