Real-bogus scores for active anomaly detection

T. A. Semenikhin; M. V. Kornilov; M. V. Pruzhinskaya; A. D.; Lavrukhina; E. Russeil; E. Gangler; E. E. O. Ishida; V. S. Korolev; K. L.; Malanchev; A. A. Volnova; S. Sreejith (The SNAD team)

arXiv:2409.10256·astro-ph.IM·December 23, 2024·Astron. Comput.

Real-bogus scores for active anomaly detection

T. A. Semenikhin, M. V. Kornilov, M. V. Pruzhinskaya, A. D., Lavrukhina, E. Russeil, E. Gangler, E. E. O. Ishida, V. S. Korolev, K. L., Malanchev, A. A. Volnova, S. Sreejith (The SNAD team)

PDF

TL;DR

This paper demonstrates that incorporating machine learning-based real-bogus scores into active anomaly detection pipelines in astronomical surveys significantly reduces artifacts, enhancing the identification of genuine astrophysical objects.

Contribution

The study introduces the use of real-bogus scores from machine learning classifiers into active anomaly detection, improving artifact rejection in large-scale astronomical data analysis.

Findings

01

Real-bogus classifiers achieve ROC-AUC scores of 0.93-0.95.

02

Inclusion of real-bogus scores reduces artifacts from 27 to 3 in the detection pipeline.

03

Active anomaly detection with real-bogus scores increases the yield of interesting astrophysical objects.

Abstract

In the task of anomaly detection in modern time-domain photometric surveys, the primary goal is to identify astrophysically interesting, rare, and unusual objects among a large volume of data. Unfortunately, artifacts -- such as plane or satellite tracks, bad columns on CCDs, and ghosts -- often constitute significant contaminants in results from anomaly detection analysis. In such contexts, the Active Anomaly Discovery (AAD) algorithm allows tailoring the output of anomaly detection pipelines according to what the expert judges to be scientifically interesting. We demonstrate how the introduction real-bogus scores, obtained from a machine learning classifier, improves the results from AAD. Using labeled data from the SNAD ZTF knowledge database, we train four real-bogus classifiers: XGBoost, CatBoost, Random Forest, and Extremely Randomized Trees. All the models perform real-bogus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.