BenchMake: Turn any scientific data set into a reproducible benchmark

Amanda S Barnard

arXiv:2506.23419·cs.LG·July 1, 2025

BenchMake: Turn any scientific data set into a reproducible benchmark

Amanda S Barnard

PDF

TL;DR

BenchMake is a tool that transforms scientific datasets into reproducible benchmarks by identifying challenging edge cases and creating statistically significant test splits across various data modalities.

Contribution

It introduces a novel method using non-negative matrix factorisation to generate meaningful benchmark splits from diverse scientific datasets.

Findings

01

Effective in isolating challenging edge cases

02

Creates statistically significant test splits

03

Applicable across multiple data modalities

Abstract

Benchmark data sets are a cornerstone of machine learning development and applications, ensuring new methods are robust, reliable and competitive. The relative rarity of benchmark sets in computational science, due to the uniqueness of the problems and the pace of change in the associated domains, makes evaluating new innovations difficult for computational scientists. In this paper a new tool is developed and tested to potentially turn any of the increasing numbers of scientific data sets made openly available into a benchmark accessible to the community. BenchMake uses non-negative matrix factorisation to deterministically identify and isolate challenging edge cases on the convex hull (the smallest convex set that contains all existing data instances) and partitions a required fraction of matched data instances into a testing set that maximises divergence and statistical significance,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.