PMLB v1.0: An open source dataset collection for benchmarking machine learning methods
Joseph D. Romano, Trang T. Le, William La Cava, John T. Gregg, Daniel, J. Goldberg, Natasha L. Ray, Praneel Chakraborty, Daniel Himmelstein, Weixuan, Fu, and Jason H. Moore

TL;DR
PMLB v1.0 offers a comprehensive, standardized collection of diverse benchmark datasets for machine learning, facilitating easier and more consistent evaluation of new methods across the data science community.
Contribution
This paper introduces PMLB v1.0, the largest open-source dataset collection for benchmarking machine learning, with improved features and community-driven updates for easier access and integration.
Findings
Largest collection of benchmark datasets available publicly
Enhanced user interface and integration with data science tools
Community-driven improvements following open-source discussions
Abstract
Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results: This release of PMLB provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods aggregated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community. Availability: PMLB is available at https://github.com/EpistasisLab/pmlb. Python and R interfaces for PMLB can be installed through the Python Package Index and Comprehensive R Archive Network, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
