Personalized Benchmarking with the Ludwig Benchmarking Toolkit
Avanika Narayan, Piero Molino, Karan Goel, Willie Neiswanger,, Christopher R\'e (Department of Computer Science, Stanford University)

TL;DR
The paper introduces the Ludwig Benchmarking Toolkit (LBT), an open-source framework enabling personalized, multi-objective benchmarking of machine learning models across diverse tasks, datasets, and evaluation criteria.
Contribution
LBT provides a configurable, standardized platform for end-to-end benchmarking that controls confounding variables and supports multi-objective evaluation, addressing limitations of traditional benchmarks.
Findings
Demonstrated large-scale comparative analysis across models and datasets.
Explored trade-offs between inference latency and performance.
Analyzed effects of pretraining on convergence and robustness.
Abstract
The rapid proliferation of machine learning models across domains and deployment settings has given rise to various communities (e.g. industry practitioners) which seek to benchmark models across tasks and objectives of personal value. Unfortunately, these users cannot use standard benchmark results to perform such value-driven comparisons as traditional benchmarks evaluate models on a single objective (e.g. average accuracy) and fail to facilitate a standardized training framework that controls for confounding variables (e.g. computational budget), making fair comparisons difficult. To address these challenges, we introduce the open-source Ludwig Benchmarking Toolkit (LBT), a personalized benchmarking toolkit for running end-to-end benchmark studies (from hyperparameter optimization to evaluation) across an easily extensible set of tasks, deep learning models, datasets and evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
