Star-Galaxy Classification in Deep LSST Data with Random Forest: A Pilot study on the Data Preview 1 Release

M. Gatto; V. Ripepi; M. Bellazzini; C. Tortora; M. Dall'Ora

arXiv:2603.25262·astro-ph.GA·May 6, 2026

Star-Galaxy Classification in Deep LSST Data with Random Forest: A Pilot study on the Data Preview 1 Release

M. Gatto, V. Ripepi, M. Bellazzini, C. Tortora, M. Dall'Ora

PDF

TL;DR

This study evaluates machine learning methods, especially Random Forests, for star-galaxy classification in deep LSST data, emphasizing the importance of multi-band photometry and uncertainties for minimizing galaxy contamination.

Contribution

It demonstrates that LSST multi-band photometry and photometric uncertainties significantly improve star-galaxy separation at faint magnitudes compared to morphology alone.

Findings

01

Multi-band photometry outperforms morphology-based classification at faint magnitudes.

02

Colors involving the u-band are crucial for robust separation.

03

Including photometric uncertainties yields the best classification performance.

Abstract

The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) will produce unprecedentedly deep and wide photometric catalogs, enabling transformative studies of faint stellar systems such as the research of ultra-faint dwarf galaxies (UFDs). A critical challenge for these studies is reliable star-galaxy separation at faint magnitudes, where compact background galaxies increasingly contaminate stellar samples. This work aims to assess the performance of supervised machine-learning techniques for star-galaxy separation in LSST-like data, quantify the relative importance of morphological and photometric information, and identify the most effective combinations of input features for minimizing galaxy contamination while preserving stellar completeness in the faint regime relevant for UFD searches. We apply a Random Forest classifier to observations of the Extended Chandra Deep Field…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.