Active Learning for Planet Habitability Classification under Extreme Class Imbalance
R. I. El-Kholy, Z. M. Hayman

TL;DR
This paper demonstrates that active learning significantly improves the efficiency of classifying potentially habitable exoplanets in highly imbalanced datasets, aiding resource-limited astronomical studies.
Contribution
It introduces an active learning framework tailored for exoplanet habitability classification, emphasizing label efficiency and uncertainty-based prioritization.
Findings
Active learning reduces labeled data needed for accurate classification.
Ensemble predictions help identify promising habitability candidates.
Active learning supports conservative prioritization in resource-limited contexts.
Abstract
The increasing size and heterogeneity of exoplanet catalogs have made systematic habitability assessment challenging, particularly given the extreme scarcity of potentially habitable planets and the evolving nature of their labels. In this study, we explore the use of pool-based active learning to improve the efficiency of habitability classification under realistic observational constraints. We construct a unified dataset from the Habitable World Catalog and the NASA Exoplanet Archive and formulate habitability assessment as a binary classification problem. A supervised baseline based on gradient-boosted decision trees is established and optimized for recall in order to prioritize the identification of rare potentially habitable planets. This model is then embedded within an active learning framework, where uncertainty-based margin sampling is compared against random querying across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
