Active Learning to Overcome Sample Selection Bias: Application to   Photometric Variable Star Classification

Joseph W. Richards; Dan L. Starr; Henrik Brink; Adam A. Miller; Joshua; S. Bloom; Nathaniel R. Butler; J. Berian James; James P. Long; John Rice

arXiv:1106.2832·astro-ph.IM·May 28, 2015

Active Learning to Overcome Sample Selection Bias: Application to Photometric Variable Star Classification

Joseph W. Richards, Dan L. Starr, Henrik Brink, Adam A. Miller, Joshua, S. Bloom, Nathaniel R. Butler, J. Berian James, James P. Long, John Rice

PDF

TL;DR

This paper demonstrates that active learning effectively mitigates sample selection bias in astronomical classification tasks, significantly improving accuracy and confidence in variable star classification compared to traditional methods.

Contribution

It introduces an active learning framework tailored for astronomical data, showing its superiority over other bias correction methods in variable star classification.

Findings

01

Active learning reduces classification error rate by over 3%.

02

Active learning improves agreement with established catalogs from 65.5% to 79.5%.

03

Classifier confidence increases from 14.6% to 42.9% after active learning.

Abstract

Despite the great promise of machine-learning algorithms to classify and predict astrophysical parameters for the vast numbers of astrophysical sources and transients observed in large-scale surveys, the peculiarities of the training data often manifest as strongly biased predictions on the data of interest. Typically, training sets are derived from historical surveys of brighter, more nearby objects than those from more extensive, deeper surveys (testing data). This sample selection bias can cause catastrophic errors in predictions on the testing data because a) standard assumptions for machine-learned model selection procedures break down and b) dense regions of testing space might be completely devoid of training data. We explore possible remedies to sample selection bias, including importance weighting (IW), co-training (CT), and active learning (AL). We argue that AL---where the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.