Machine Learning for Exoplanet Detection: A Comparative Analysis Using Kepler Data

Reihaneh Karimi; Mahdiyar Mousavi-Sadr; Mohammad H. Zhoolideh Haghighi; and Fatemeh S. Tabatabaei

arXiv:2508.09689·astro-ph.EP·August 14, 2025

Machine Learning for Exoplanet Detection: A Comparative Analysis Using Kepler Data

Reihaneh Karimi, Mahdiyar Mousavi-Sadr, Mohammad H. Zhoolideh Haghighi, and Fatemeh S. Tabatabaei

PDF

TL;DR

This study evaluates various machine learning algorithms for exoplanet detection using Kepler data, finding Random Forest to be the most accurate and robust method, especially when combined with SMOTE for class imbalance.

Contribution

It provides a comprehensive comparison of ML classifiers for exoplanet detection, highlighting the effectiveness of ensemble methods like Random Forest with data balancing techniques.

Findings

01

Random Forest achieves 99.8% accuracy.

02

SMOTE improves model performance significantly.

03

Ensemble methods outperform simpler classifiers.

Abstract

The discovery of exoplanets has expanded our understanding of planetary systems and opened new avenues for astronomical research. In this study, we present a machine learning (ML) framework for exoplanet identification using a time-series photometric dataset from the Kepler Space Telescope, comprising 3,198 flux measurements across 5,074 stars. We investigate the performance of four supervised classification algorithms, namely Random Forest, k-Nearest Neighbors (KNN), Decision Tree, and Logistic Regression, using a comprehensive set of evaluation metrics such as accuracy, precision, recall, F1-score, Area Under the Receiver Operating Characteristic Curve (AUC-ROC), confusion matrices, and learning curves. Among the models, Random Forest achieves the highest accuracy (99.8\%) and near-perfect F1-scores, demonstrating superior generalization and robustness. KNN also performs strongly,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.