Reproducible $k$-means clustering in galaxy feature data from the GAMA survey
Sebastian Turner, Lee S. Kelvin, Ivan K. Baldry, Paulo J. Lisboa,, Steven N. Longmore, Chris A. Collins, Benne W. Holwerda, Andrew M. Hopkins,, and Jochen Liske

TL;DR
This paper demonstrates a scalable, reproducible application of $k$-means clustering to galaxy feature data from the GAMA survey, revealing multiple sub-populations and evolutionary pathways in the local Universe.
Contribution
It introduces a robust, unsupervised clustering method for high-dimensional galaxy data, suitable for large future surveys, and identifies distinct galaxy sub-populations.
Findings
Galaxy populations split into 2, 3, 5, and 6 sub-populations.
Local environment influences low-mass galaxy evolution.
Massive galaxies evolve more passively from blue to red.
Abstract
A fundamental bimodality of galaxies in the local Universe is apparent in many of the features used to describe them. Multiple sub-populations exist within this framework, each representing galaxies following distinct evolutionary pathways. Accurately identifying and characterising these sub-populations requires that a large number of galaxy features be analysed simultaneously. Future galaxy surveys such as LSST and Euclid will yield data volumes for which traditional approaches to galaxy classification will become unfeasible. To address this, we apply a robust -means unsupervised clustering method to feature data derived from a sample of 7338 local-Universe galaxies selected from the Galaxy And Mass Assembly (GAMA) survey. This allows us to partition our sample into clusters without the need for training on pre-labelled data, facilitating a full census of our high dimensionality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
