PMLB v1.0: An open source dataset collection for benchmarking machine   learning methods

Joseph D. Romano; Trang T. Le; William La Cava; John T. Gregg; Daniel; J. Goldberg; Natasha L. Ray; Praneel Chakraborty; Daniel Himmelstein; Weixuan; Fu; and Jason H. Moore

arXiv:2012.00058·cs.LG·April 7, 2021

PMLB v1.0: An open source dataset collection for benchmarking machine learning methods

Joseph D. Romano, Trang T. Le, William La Cava, John T. Gregg, Daniel, J. Goldberg, Natasha L. Ray, Praneel Chakraborty, Daniel Himmelstein, Weixuan, Fu, and Jason H. Moore

PDF

3 Repos

TL;DR

PMLB v1.0 offers a comprehensive, standardized collection of diverse benchmark datasets for machine learning, facilitating easier and more consistent evaluation of new methods across the data science community.

Contribution

This paper introduces PMLB v1.0, the largest open-source dataset collection for benchmarking machine learning, with improved features and community-driven updates for easier access and integration.

Findings

01

Largest collection of benchmark datasets available publicly

02

Enhanced user interface and integration with data science tools

03

Community-driven improvements following open-source discussions

Abstract

Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results: This release of PMLB provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods aggregated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community. Availability: PMLB is available at https://github.com/EpistasisLab/pmlb. Python and R interfaces for PMLB can be installed through the Python Package Index and Comprehensive R Archive Network, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.