On the need for synthetic data and robust data simulators in the 2020s
Molly S. Peeples (STScI/JHU), Bjorn Emonts (NRAO), Mark Kyprianou, (STScI), Matthew T. Penny (Ohio State), Gregory F. Snyder (STScI),, Christopher C. Stark (STScI), Michael Troxel (Duke), Neil T. Zimmerman, (GSFC), John ZuHone (Harvard-Smithsonian CfA)

TL;DR
The paper emphasizes the importance of synthetic data and robust simulators in astronomy for testing, comparison, prediction, and risk mitigation, advocating for increased funding and data sharing.
Contribution
It highlights the need for widespread adoption of synthetic data and calls for investment in data simulators and public archives to enhance research capabilities.
Findings
Synthetic data aids in testing measurement methods.
Synthetic data enables better model-observation comparisons.
Publicly available synthetic data lowers barriers to research.
Abstract
As observational datasets become larger and more complex, so too are the questions being asked of these data. Data simulations, i.e., synthetic data with properties (pixelization, noise, PSF, artifacts, etc.) akin to real data, are therefore increasingly required for several purposes, including: (1) testing complicated measurement methods, (2) comparing models and astrophysical simulations to observations in a manner that requires as few assumptions about the data as possible, (3) predicting observational results based on models and astrophysical simulations for, e.g., proposal planning, and (4) mitigating risk for future observatories and missions by effectively priming and testing pipelines. We advocate for an increase in using synthetic data to plan for and interpret real observations as a matter of routine. This will require funding for (1) facilities to provide robust data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Advanced Data Storage Technologies · Research Data Management Practices
