Feature importance for machine learning redshifts applied to SDSS galaxies
Ben Hoyle, Markus Michael Rau, Roman Zitlau, Stella Seitz, Jochen, Weller

TL;DR
This paper evaluates feature importance in machine learning for photometric redshift estimation of SDSS galaxies, demonstrating improved accuracy and efficiency using decision trees with AdaBoost and neural networks.
Contribution
It introduces a feature selection approach that enhances redshift prediction accuracy and reduces outliers, with a comparison of machine learning methods and SDSS photometric redshifts.
Findings
Feature selection improves redshift estimates by 18%
Reduces catastrophic outliers by 32%
Decision trees with AdaBoost outperform neural networks in speed and accuracy
Abstract
We present an analysis of importance feature selection applied to photometric redshift estimation using the machine learning architecture Decision Trees with the ensemble learning routine Adaboost (hereafter RDF). We select a list of 85 easily measured (or derived) photometric quantities (or `features') and spectroscopic redshifts for almost two million galaxies from the Sloan Digital Sky Survey Data Release 10. After identifying which features have the most predictive power, we use standard artificial Neural Networks (aNN) to show that the addition of these features, in combination with the standard magnitudes and colours, improves the machine learning redshift estimate by 18% and decreases the catastrophic outlier rate by 32%. We further compare the redshift estimate using RDF with those from two different aNNs, and with photometric redshifts available from the SDSS. We find that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
