Using Active Learning to Improve Quasar Identification for the DESI Spectra Processing Pipeline
Dylan Green, David Kirkby, J. Aguilar, S. Ahlen, D. M. Alexander, E. Armengaud, S. Bailey, A. Bault, D. Bianchi, A. Brodzeller, D. Brooks, T. Claybaugh, R. de Belsunce, A. de la Macorra, P. Doel, V. A. Fawcett, S. Ferraro, A. Font-Ribera, J. E. Forero-Romero, E. Gazta\~naga

TL;DR
This paper introduces an active learning approach with outlier rejection to efficiently retrain QuasarNET for DESI spectra, significantly improving classification accuracy with less labeled data.
Contribution
We develop a novel active learning algorithm with outlier rejection for training QuasarNET on DESI data, reducing data requirements while enhancing classification performance.
Findings
Achieved comparable or better classification accuracy with less than 10% of training data.
Discovered and addressed a systemic error in QuasarNET's redshift estimation.
Improved consistency of object classification on unlabeled data.
Abstract
The Dark Energy Spectroscopic Instrument (DESI) survey uses an automatic spectral classification pipeline to classify spectra. QuasarNET is a convolutional neural network used as part of this pipeline originally trained using data from the Baryon Oscillation Spectroscopic Survey (BOSS). In this paper we implement an active learning algorithm to optimally select spectra to use for training a new version of the QuasarNET weights file using only DESI data, specifically to improve classification accuracy. This active learning algorithm includes a novel outlier rejection step using a Self-Organizing Map to ensure we label spectra representative of the larger quasar sample observed in DESI. We perform two iterations of the active learning pipeline, assembling a final dataset of 5600 labeled spectra, a small subset of the approx 1.3 million quasar targets in DESI's Data Release 1. When…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
