Robust Machine Learning Applied to Terascale Astronomical Datasets
Nicholas M. Ball (1), Robert J. Brunner (1, 2), Adam D. Myers (1), ((1) Department of Astronomy, University of Illinois at Urbana-Champaign, (2), National Center for Supercomputing Applications, Urbana-Champaign)

TL;DR
This paper demonstrates the application of machine learning algorithms, particularly k-nearest neighbors, on terascale astronomical datasets using supercomputing resources to improve classification and measurement accuracy.
Contribution
It introduces a novel use of supercomputing for data mining in astronomy, extending machine learning methods to handle terascale datasets efficiently.
Findings
Improved classifications for over 100 million celestial objects
Enhanced distance measurement techniques
Identification of infrastructure challenges for petascale data
Abstract
We present recent results from the LCDM (Laboratory for Cosmological Data Mining; http://lcdm.astro.uiuc.edu) collaboration between UIUC Astronomy and NCSA to deploy supercomputing cluster resources and machine learning algorithms for the mining of terascale astronomical datasets. This is a novel application in the field of astronomy, because we are using such resources for data mining, and not just performing simulations. Via a modified implementation of the NCSA cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million stars and galaxies in the Sloan Digital Sky Survey, improved distance measures, and a full exploitation of the simple but powerful k-nearest neighbor algorithm. A driving principle of this work is that our methods should be extensible from current terascale datasets to upcoming petascale datasets and beyond. We discuss…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGamma-ray bursts and supernovae · Astronomical Observations and Instrumentation · Astronomy and Astrophysical Research
