The Multimodal Universe: Enabling Large-Scale Machine Learning with 100TB of Astronomical Scientific Data
The Multimodal Universe Collaboration. Eirini Angeloudi, Jeroen, Audenaert, Micah Bowles, Benjamin M. Boyd, David Chemaly, Brian Cherinka,, Ioana Ciuc\u{a}, Miles Cranmer, Aaron Do, Matthew Grayling, Erin E. Hayes,, Tom Hehir, Shirley Ho, Marc Huertas-Company, Kartheik G. Iyer

TL;DR
The paper introduces the MULTIMODAL UNIVERSE, a comprehensive 100TB dataset of astronomical data designed to advance large-scale multimodal machine learning research in astrophysics.
Contribution
It provides a massive, multi-channel dataset with benchmark tasks, enabling development of large multimodal models for scientific applications in astronomy.
Findings
Enables new multimodal machine learning research in astrophysics
Provides a large-scale, diverse dataset for model training and benchmarking
Facilitates development of scientific models using multimodal astronomical data
Abstract
We present the MULTIMODAL UNIVERSE, a large-scale multimodal dataset of scientific astronomical data, compiled specifically to facilitate machine learning research. Overall, the MULTIMODAL UNIVERSE contains hundreds of millions of astronomical observations, constituting 100\,TB of multi-channel and hyper-spectral images, spectra, multivariate time series, as well as a wide variety of associated scientific measurements and "metadata". In addition, we include a range of benchmark tasks representative of standard practices for machine learning methods in astrophysics. This massive dataset will enable the development of large multi-modal models specifically targeted towards scientific applications. All codes used to compile the MULTIMODAL UNIVERSE and a description of how to access the data is available at https://github.com/MultimodalUniverse/MultimodalUniverse
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAstronomy and Astrophysical Research · Astronomical Observations and Instrumentation
