Real-bogus scores for active anomaly detection
T. A. Semenikhin, M. V. Kornilov, M. V. Pruzhinskaya, A. D., Lavrukhina, E. Russeil, E. Gangler, E. E. O. Ishida, V. S. Korolev, K. L., Malanchev, A. A. Volnova, S. Sreejith (The SNAD team)

TL;DR
This paper demonstrates that incorporating machine learning-based real-bogus scores into active anomaly detection pipelines in astronomical surveys significantly reduces artifacts, enhancing the identification of genuine astrophysical objects.
Contribution
The study introduces the use of real-bogus scores from machine learning classifiers into active anomaly detection, improving artifact rejection in large-scale astronomical data analysis.
Findings
Real-bogus classifiers achieve ROC-AUC scores of 0.93-0.95.
Inclusion of real-bogus scores reduces artifacts from 27 to 3 in the detection pipeline.
Active anomaly detection with real-bogus scores increases the yield of interesting astrophysical objects.
Abstract
In the task of anomaly detection in modern time-domain photometric surveys, the primary goal is to identify astrophysically interesting, rare, and unusual objects among a large volume of data. Unfortunately, artifacts -- such as plane or satellite tracks, bad columns on CCDs, and ghosts -- often constitute significant contaminants in results from anomaly detection analysis. In such contexts, the Active Anomaly Discovery (AAD) algorithm allows tailoring the output of anomaly detection pipelines according to what the expert judges to be scientifically interesting. We demonstrate how the introduction real-bogus scores, obtained from a machine learning classifier, improves the results from AAD. Using labeled data from the SNAD ZTF knowledge database, we train four real-bogus classifiers: XGBoost, CatBoost, Random Forest, and Extremely Randomized Trees. All the models perform real-bogus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
