Improving machine learning-derived photometric redshifts and physical property estimates using unlabelled observations
A. Humphrey, P.A.C. Cunha, A. Paulino-Afonso, S. Amarantidis, R., Carvajal, J.M. Gomes, I. Matute, P. Papaderos

TL;DR
This paper demonstrates that semi-supervised pseudo-labelling significantly improves the accuracy of machine learning models in estimating galaxy redshifts and physical properties from photometric data, especially for large upcoming surveys.
Contribution
It introduces and tests a pseudo-labelling semi-supervised approach for galaxy property estimation, showing notable improvements over traditional supervised methods.
Findings
Up to 15% reduction in absolute error for redshift and property estimates.
Significant decrease in catastrophic outlier fraction.
Gradient boosting methods benefit most from pseudo-labelling.
Abstract
In the era of huge astronomical surveys, machine learning offers promising solutions for the efficient estimation of galaxy properties. The traditional, `supervised' paradigm for the application of machine learning involves training a model on labelled data, and using this model to predict the labels of previously unlabelled data. The semi-supervised `pseudo-labelling' technique offers an alternative paradigm, allowing the model training algorithm to learn from both labelled data and as-yet unlabelled data. We test the pseudo-labelling method on the problems of estimating redshift, stellar mass, and star formation rate, using COSMOS2015 broad band photometry and one of several publicly available machine learning algorithms, and we obtain significant improvements compared to purely supervised learning. We find that the gradient-boosting tree methods CatBoost, XGBoost, and LightGBM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Galaxies: Formation, Evolution, Phenomena
