Personalized Benchmarking with the Ludwig Benchmarking Toolkit

Avanika Narayan; Piero Molino; Karan Goel; Willie Neiswanger,; Christopher R\'e (Department of Computer Science; Stanford University)

arXiv:2111.04260·cs.LG·November 9, 2021·1 cites

Personalized Benchmarking with the Ludwig Benchmarking Toolkit

Avanika Narayan, Piero Molino, Karan Goel, Willie Neiswanger,, Christopher R\'e (Department of Computer Science, Stanford University)

PDF

Open Access 2 Repos

TL;DR

The paper introduces the Ludwig Benchmarking Toolkit (LBT), an open-source framework enabling personalized, multi-objective benchmarking of machine learning models across diverse tasks, datasets, and evaluation criteria.

Contribution

LBT provides a configurable, standardized platform for end-to-end benchmarking that controls confounding variables and supports multi-objective evaluation, addressing limitations of traditional benchmarks.

Findings

01

Demonstrated large-scale comparative analysis across models and datasets.

02

Explored trade-offs between inference latency and performance.

03

Analyzed effects of pretraining on convergence and robustness.

Abstract

The rapid proliferation of machine learning models across domains and deployment settings has given rise to various communities (e.g. industry practitioners) which seek to benchmark models across tasks and objectives of personal value. Unfortunately, these users cannot use standard benchmark results to perform such value-driven comparisons as traditional benchmarks evaluate models on a single objective (e.g. average accuracy) and fail to facilitate a standardized training framework that controls for confounding variables (e.g. computational budget), making fair comparisons difficult. To address these challenges, we introduce the open-source Ludwig Benchmarking Toolkit (LBT), a personalized benchmarking toolkit for running end-to-end benchmark studies (from hyperparameter optimization to evaluation) across an easily extensible set of tasks, deep learning models, datasets and evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI)