DMLR: Data-centric Machine Learning Research -- Past, Present and Future
Luis Oala, Manil Maskey, Lilith Bat-Leah, Alicia Parrish, Nezihe Merve, G\"urel, Tzu-Sheng Kuo, Yang Liu, Rotem Dror, Danilo Brajovic, Xiaozhe Yao,, Max Bartolo, William A Gaviria Rojas, Ryan Hileman, Rainier Aliment, Michael, W. Mahoney, Meg Risdal, Matthew Lease

TL;DR
This paper discusses the importance of community engagement and infrastructure in developing next-generation datasets to advance machine learning science, emphasizing collective efforts for sustainable progress.
Contribution
It highlights the role of community and infrastructure in creating and maintaining impactful datasets for future machine learning research.
Findings
Community engagement is vital for dataset development.
Infrastructure supports sustainable dataset creation.
Collective efforts can accelerate ML scientific progress.
Abstract
Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Machine Learning and Data Classification
