Active Learning to Overcome Sample Selection Bias: Application to Photometric Variable Star Classification
Joseph W. Richards, Dan L. Starr, Henrik Brink, Adam A. Miller, Joshua, S. Bloom, Nathaniel R. Butler, J. Berian James, James P. Long, John Rice

TL;DR
This paper demonstrates that active learning effectively mitigates sample selection bias in astronomical classification tasks, significantly improving accuracy and confidence in variable star classification compared to traditional methods.
Contribution
It introduces an active learning framework tailored for astronomical data, showing its superiority over other bias correction methods in variable star classification.
Findings
Active learning reduces classification error rate by over 3%.
Active learning improves agreement with established catalogs from 65.5% to 79.5%.
Classifier confidence increases from 14.6% to 42.9% after active learning.
Abstract
Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
