# Systematic Serendipity: A Test of Unsupervised Machine Learning as a   Method for Anomaly Detection

**Authors:** Daniel Giles, Lucianne Walkowicz

arXiv: 1812.07156 · 2019-01-09

## TL;DR

This paper introduces an unsupervised machine learning method that clusters astronomical data to identify anomalies, demonstrated on Kepler lightcurves, including Boyajian's Star, to facilitate discovery of rare phenomena.

## Contribution

The work presents a density-based clustering approach for anomaly detection in large astronomical datasets, validated on Kepler data with known and unknown anomalies.

## Key findings

- Successfully identified Boyajian's Star as an anomaly.
- Detected diverse outliers including rare phenomena and data artifacts.
- Outliers constitute less than 4% of each dataset quarter.

## Abstract

Advances in astronomy are often driven by serendipitous discoveries. As survey astronomy continues to grow, the size and complexity of astronomical databases will increase, and the ability of astronomers to manually scour data and make such discoveries decreases. In this work, we introduce a machine learning-based method to identify anomalies in large datasets to facilitate such discoveries, and apply this method to long cadence lightcurves from NASA's Kepler Mission. Our method clusters data based on density, identifying anomalies as data that lie outside of dense regions. This work serves as a proof-of-concept case study and we test our method on four quarters of the Kepler long cadence lightcurves. We use Kepler's most notorious anomaly, Boyajian's Star (KIC 8462852), as a rare `ground truth' for testing outlier identification to verify that objects of genuine scientific interest are included among the identified anomalies. We evaluate the method's ability to identify known anomalies by identifying unusual behavior in Boyajian's Star, we report the full list of identified anomalies for these quarters, and present a sample subset of identified outliers that includes unusual phenomena, objects that are rare in the Kepler field, and data artifacts. By identifying <4% of each quarter as outlying data, we demonstrate that this anomaly detection method can create a more targeted approach in searching for rare and novel phenomena.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.07156/full.md

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/1812.07156/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/1812.07156/full.md

---
Source: https://tomesphere.com/paper/1812.07156