Massive Datasets in Astronomy
Robert J. Brunner (1), S. George Djorgovski (1), Thomas A. Prince (2),, and Alex S. Szalay (3) ((1) Astronomy, Caltech, (2) Space Radiation, Laboratory, Caltech, (3) Department of Astronomy & Physics, Johns Hopkins, University)

TL;DR
Astronomy has evolved into a data-intensive science with massive datasets from sky surveys and simulations, requiring advanced data management, mining, and virtual observatories to facilitate new research avenues.
Contribution
This paper provides an overview of large astronomical datasets, discusses data archiving techniques, and explores the future of data-driven astronomy with virtual observatories.
Findings
Existence of multi-terabyte and multi-petabyte sky survey datasets.
Development of virtual observatories as a new information infrastructure.
Data mining enhances scientific analysis and opens new research directions.
Abstract
Astronomy has a long history of acquiring, systematizing, and interpreting large quantities of data. Starting from the earliest sky atlases through the first major photographic sky surveys of the 20th century, this tradition is continuing today, and at an ever increasing rate. Like many other fields, astronomy has become a very data-rich science, driven by the advances in telescope, detector, and computer technology. Numerous large digital sky surveys and archives already exist, with information content measured in multiple Terabytes, and even larger, multi-Petabyte data sets are on the horizon. Systematic observations of the sky, over a range of wavelengths, are becoming the primary source of astronomical data. Numerical simulations are also producing comparable volumes of information. Data mining promises to both make the scientific utilization of these data sets more effective and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Scientific Computing and Data Management
