On Machine-Learned Classification of Variable Stars with Sparse and   Noisy Time-Series Data

Joseph W. Richards; Dan L. Starr; Nathaniel R. Butler; Joshua S.; Bloom; John M. Brewer; Arien Crellin-Quick; Justin Higgins; Rachel Kennedy,; Maxime Rischard

arXiv:1101.1959·astro-ph.IM·March 17, 2015

On Machine-Learned Classification of Variable Stars with Sparse and Noisy Time-Series Data

Joseph W. Richards, Dan L. Starr, Nathaniel R. Butler, Joshua S., Bloom, John M. Brewer, Arien Crellin-Quick, Justin Higgins, Rachel Kennedy,, Maxime Rischard

PDF

TL;DR

This paper presents a machine-learning framework for classifying variable stars from sparse, noisy time-series data, achieving high accuracy and efficiency, and introduces hierarchical classification to improve results.

Contribution

It introduces a robust methodology combining feature extraction, tree-ensemble classifiers, and hierarchical classification for variable star classification, improving accuracy and reducing errors.

Findings

01

Achieved 22.8% overall classification error with random forest.

02

Discovered 98.2% efficiency for pulsational variables at 95% purity.

03

Reduced catastrophic error rate to 7.8% with hierarchical classification.

Abstract

With the coming data deluge from synoptic surveys, there is a growing need for frameworks that can quickly and automatically produce calibrated classification probabilities for newly-observed variables based on a small number of time-series measurements. In this paper, we introduce a methodology for variable-star classification, drawing from modern machine-learning techniques. We describe how to homogenize the information gleaned from light curves by selection and computation of real-numbered metrics ("feature"), detail methods to robustly estimate periodic light-curve features, introduce tree-ensemble methods for accurate variable star classification, and show how to rigorously evaluate the classification results using cross validation. On a 25-class data set of 1542 well-studied variable stars, we achieve a 22.8% overall classification error using the random forest classifier; this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.