A Theory of Dynamic Benchmarks
Ali Shirali, Rediet Abebe, Moritz Hardt

TL;DR
This paper provides a theoretical foundation for dynamic benchmarks, analyzing their potential and limitations through models and simulations, and highlighting how data collection strategies impact model performance over iterative rounds.
Contribution
It introduces the first theoretical models of dynamic benchmarking, analyzing performance progression and limitations, and supports findings with simulations on real datasets.
Findings
Model performance improves initially but stalls after few rounds.
Label noise exacerbates performance stagnation.
Hierarchical data collection models outperform simpler ones.
Abstract
Dynamic benchmarks interweave model fitting and data collection in an attempt to mitigate the limitations of static benchmarks. In contrast to an extensive theoretical and empirical study of the static setting, the dynamic counterpart lags behind due to limited empirical studies and no apparent theoretical foundation to date. Responding to this deficit, we initiate a theoretical study of dynamic benchmarking. We examine two realizations, one capturing current practice and the other modeling more complex settings. In the first model, where data collection and model fitting alternate sequentially, we prove that model performance improves initially but can stall after only three rounds. Label noise arising from, for instance, annotator disagreement leads to even stronger negative results. Our second model generalizes the first to the case where data collection and model fitting have a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Sports Analytics and Performance
