Benchmark Data Repositories for Better Benchmarking

Rachel Longjohn; Markelle Kelly; Sameer Singh; Padhraic Smyth

arXiv:2410.24100·cs.LG·November 1, 2024·5 cites

Benchmark Data Repositories for Better Benchmarking

Rachel Longjohn, Markelle Kelly, Sameer Singh, Padhraic Smyth

PDF

Open Access 1 Video

TL;DR

This paper analyzes benchmark data repositories in machine learning, highlighting their importance in improving benchmarking practices by addressing dataset quality, documentation, and reproducibility issues.

Contribution

It provides a comprehensive analysis of benchmark data repositories and offers considerations for their design and use to enhance benchmarking in machine learning.

Findings

01

Identifies issues with dataset representational harms and validity.

02

Highlights problems of overreliance on few datasets and metrics.

03

Discusses the importance of documentation and reproducibility in repositories.

Abstract

In machine learning research, it is common to evaluate algorithms via their performance on standard benchmark datasets. While a growing body of work establishes guidelines for -- and levies criticisms at -- data and benchmarking practices in machine learning, comparatively less attention has been paid to the data repositories where these datasets are stored, documented, and shared. In this paper, we analyze the landscape of these $benchmark data repositories$ and the role they can play in improving benchmarking. This role includes addressing issues with both datasets themselves (e.g., representational harms, construct validity) and the manner in which evaluation is carried out using such datasets (e.g., overemphasis on a few datasets and metrics, lack of reproducibility). To this end, we identify and discuss a set of considerations surrounding the design and use of benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Benchmark Data Repositories for Better Benchmarking· slideslive

Taxonomy

TopicsSemantic Web and Ontologies

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Focus