The LSST Data Mining Research Agenda
K. D. Borne, J. Becla, I. Davidson, A. Szalay, J. A. Tyson

TL;DR
This paper outlines a comprehensive research agenda for data mining in the LSST astronomical survey, focusing on scalable algorithms, system robustness, and efficient data querying at petabyte scales.
Contribution
It introduces new strategies for scalable, grid-enabled data mining, classification brokering, and multi-resolution exploration tailored for the LSST's massive datasets.
Findings
Design of scalable machine learning algorithms for petabyte data
Development of a robust classification brokering system
Implementation of multi-resolution database exploration methods
Abstract
We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
