Elements of effective machine learning datasets in astronomy
Bernie Boscoe, Tuan Do, Evan Jones, Yunqi Li, Kevin Alfaro, Christy Ma

TL;DR
This paper discusses key elements for creating effective machine learning datasets in astronomy, emphasizing data quality, structure, and metadata to improve usability, reproducibility, and scientific insights.
Contribution
It defines the essential elements of effective astronomical machine learning datasets and offers practical suggestions for their design and construction.
Findings
Well-defined data points and structure are crucial for effective datasets.
Metadata enhances data usability and reproducibility.
Proper dataset design fosters reusable and transparent scientific practices.
Abstract
In this work, we identify elements of effective machine learning datasets in astronomy and present suggestions for their design and creation. Machine learning has become an increasingly important tool for analyzing and understanding the large-scale flood of data in astronomy. To take advantage of these tools, datasets are required for training and testing. However, building machine learning datasets for astronomy can be challenging. Astronomical data is collected from instruments built to explore science questions in a traditional fashion rather than to conduct machine learning. Thus, it is often the case that raw data, or even downstream processed data is not in a form amenable to machine learning. We explore the construction of machine learning datasets and we ask: what elements define effective machine learning datasets? We define effective machine learning datasets in astronomy to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Analysis with R · Big Data and Business Intelligence · Scientific Computing and Data Management
