Revisiting the SFR-Mass relation at z=0 with detailed deep learning based morphologies
Helena Dom\'inguez S\'anchez, Mariangela Bernardi, Marc, Huertas-Company

TL;DR
This paper presents a large deep learning-based morphological catalogue for 670,000 SDSS galaxies, enabling detailed analysis of galaxy morphology's impact on the star formation-stellar mass relation.
Contribution
It introduces the largest T-Type galaxy classification catalogue derived from deep learning on SDSS images, with uncertainties and multiple morphological labels.
Findings
The SFR-M* relation varies with galaxy morphology.
Deep learning effectively classifies galaxy morphologies at large scale.
The catalogue enables new insights into galaxy evolution studies.
Abstract
Galaxy morphology is a key parameter in galaxy evolution studies. The enormous number of galaxies which current and future surveys will observe demand of automated methods for morphological classification. Supervised learning techniques have been successfully used for the morphological classification of galaxies from different datasets, including Sloan Digital Sky Survey (SDSS), Mapping Galaxies with Apache Point Observatory (MaNGA) or Dark Energy Survey (DES). With these proceedings, we release the morphological catalogue for a sample of 670,000 SDSS galaxies based on the deep learning models trained on SDSS RGB images with morphological labels from human-based classification catalogues. The released catalogue includes binary classifications (early-type versus late-type, elliptical versus lenticular, identification of edge-on and barred galaxies) plus a T-Type. The classifications also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy and Chemometric Analyses · Data Visualization and Analytics · Remote Sensing in Agriculture
\idline
75282
11institutetext: Centro de Estudios de Física del Cosmos de Aragón, Plaza San Juan 1, 44001, Teruel, Spain 22institutetext: Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, PA 19104, USA 33institutetext: Instituto de Astrofísica de Canarias, E-38200, La Laguna, Tenerife, Spain 44institutetext: LERMA, Observatoire de Paris, PSL Research University, CNRS, Sorbonne Universités, UPMC Univ. Paris 06, F-75014 Paris, France 55institutetext: University of Paris Denis Diderot, University of Paris Sorbonne Cité (PSC), 75205 Paris Cedex 13, France
55email: [email protected]
Revisiting the SFR-Mass relation at z=0 with detailed deep learning based morphologies
H. Domínguez Sánchez 11
M. Bernardi and M. Huertas-Company 2233 4 4 5 5
(Received: Day Month Year; Accepted: Day Month Year)
Abstract
Galaxy morphology is a key parameter in galaxy evolution studies. The enormous number of galaxies which future and current surveys will observe demand of automated methods for morphological classification. Supervised learning techniques have been successfully used for the morphological classification of galaxies from different datasets, including Sloan Digital Sky Survey (SDSS), Mapping Galaxies with Apache Point Observatory (MaNGA) or Dark Energy Survey (DES). With these proceedings, we release the morphological catalogue for a sample of 670,000 SDSS galaxies based on the deep learning models trained on SDSS RGB images with morphological labels from human-based classification catalogues. The released catalogue includes binary classifications (early-type versus late-type, elliptical versus lenticular, identification of edge-on and barred galaxies) plus a T-Type. The classifications also include k-fold based uncertainties. This is, as of today, the largest catalogue including a T-Type classification. As an example of the scientific potential of this classification, we show how the location of the galaxies in the star formation - stellar mass plane (SFR-M*∗) depends on morphology. This is the first time the SFR-M∗* relation is combined with T-Type information for such a large sample of galaxies.
keywords:
Galaxies: morphology – Methods – Machine Learning
1 Introduction
Galaxy morphology is one of the key parameters in galaxy evolution studies. While the existence of an ordered sequence of galaxy appearance is well know since the beginning of the last century (Hubble, 1926) its origin is still highly debated. Galaxy morphology is strongly correlated with their stellar populations, but its connection with mass assembly mechanisms and quenching events is still unclear (e.g., Hirschmann et al., 2015; Nelson et al., 2016; Rodriguez-Gomez et al., 2016). In order to shed some light on the interrelation between galaxy morphology and evolutionary paths, large samples of galaxies with robust morphological classifications at different cosmic epochs are needed. With the arrival of large imaging surveys, visual classification of galaxies becomes unfeasible and automated methods are required.
Supervised deep learning (DL) methods based on Convolutional Neural Networks (CNN) using galaxy images as input has demonstrated to be very successful for the classification of nearby bright galaxies for which large samples of previously labelled galaxies, such as Galaxy Zoo (Willett et al., 2013) or Nair & Abraham (2010), were available. In Domínguez Sánchez et al. (2018) we trained a CNN, paying special attention to the training sample selection (i.e., using only galaxies with large agreement among Galaxy Zoo classifiers) and we published what was, at the time, the largest DL-based morphological catalogue, including 670,000 SDSS DR7 (Abazajian et al., 2009) galaxies from the Meert et al. (2015) sample ().
In Domínguez Sánchez et al. (2022) we presented an improved version of the classification obtained in Domínguez Sánchez et al. (2018). We used a vanilla convolutional neural network (CNN), consisting of four convolutional layers with squared filters of different sizes (6, 5, 2, 3) followed by dropout and 22 maxpooling. A fully connected layer returns one output value. The input are RGB SDSS-DR7 cutouts with a variable size proportional to the Petrosian radius of the galaxy (5R90). The cutouts are re-sampled to 6969 pixels before being fed to the CNN. The RGB images were normalized to the maximum of each band to avoid any dependence of the morphological classification on color information. We used the Nair & Abraham (2010) catalogue to train a regression model which returns a T-Type (analogue to the Hubble sequence), and two binary models: one that separates early (ETG) or late type galaxies (LTG) and the other that separates elliptical (Es) from lenticular galaxies (S0)111This model is only meaningful for galaxies with T-Type 0.. The low end of the T-Types was better recovered than in the previous version (Figure 4 in Domínguez Sánchez et al. 2022). The separation between ETGs and LTGs complements the T-Type classification, especially at the intermediate types (-1 T-Type 2), where the T-Type values are more uncertain. The Galaxy Zoo catalogue (Willett et al., 2013) was used for training two binary models, one that identifies barred galaxies and another that identifies edge-on galaxies. In addition, k-fold-based uncertainties on the classifications were also provided. These models were applied to the MaNGA (Bundy et al., 2015) DR17 final sample (Abdurro’uf et al., 2021), including 10,000 galaxies, and released in the form of the MaNGA Deep Learning Morphological DR17 Value Added Catalogue (MDLM-VAC-DR17)222https://www.sdss4.org/dr17/data_access/value-added-catalogs/?vac_id=manga-morphology-deep-learning-dr17-catalog.
2 Updated SDSS Morphological catalogue
MaNGA is an Integral Field spectroscopic survey which provides resolved spectral information for each galaxy. However, the morphological classification is obtained by training the DL models with RGB SDSS images, meaning that MaNGA played no role at all in the construction of the morphological catalogue, except for the sample selection. We have now applied the DL models from Domínguez Sánchez et al. (2022) to the full Meert et al. (2015) sample and we take the opportunity to release this new catalogue with these proceedings. A detailed description of the construction of the models and the performance of the different classification tasks can be found in Domínguez Sánchez et al. (2022). Since the imaging data and the magnitude range of the MaNGA DR17 and the Meert et al. (2015) samples are similar, we do not expect significant differences in the results.
The catalogue provides binary classifications (ETG vs LTG, E vs S0, edge-on, bars) and can be found in this link 333https://archive.cefca.es/ancillary_data/sdss_morphological_catalogues/sdss_morphological_catalogues.tar.gz. Its content is identical to the MDLM-VAC-DR17, except for the visual classification (VC) and visual flag (VF), unfeasible for such a large galaxy sample as the one presented here. We refer the reader to Table 4 of Domínguez Sánchez et al. (2022) for a detailed description of the catalogue columns.
We recommend the following criteria for selecting samples of Es, S0 and spirals (S):
- •
E: (PLTG 0.5) and (T-Type 0) and (PS0 0.5)
- •
S0: (PLTG 0.5) and (T-Type 0) and (PS0 0.5)
- •
S: (PLTG 0.5) and (T-Type 0)
where PLTG separates ETGs from LTGs and PS0 separates Es from S0 (only meaningful for ETGs). Note that this is the most restrictive criteria, as it combines the information of the LTG/ETG classification with the T-Type. The thresholds at PLTG=0.5 and PS0=0.5 are a good compromise between completeness and purity (see Figure 5 and 7 in Domínguez Sánchez et al. 2022) but can be modified in order to obtain a more pure or complete S0 sample, depending on the users purpose. The above selection returns 18, 20 and 50% of Es, S0 and S, respectively, leaving 12% of the galaxies with an ambiguous classifications (their PLTG and T-Type values are discordant). Alternatively, one can use the T-Type information only (which returns 18, 20 and 62 % of E, S0 and S) or the PLTG (which returns 18, 32, 50%). The S0 is the population more affected by the different selection criteria, as already discussed in Section 3.4.1 of Domínguez Sánchez et al. (2022).
3 Scientific application: the SFR-Mass plane
It is well known that galaxy morphology is related to galaxy properties, in particular to stellar mass and star formation efficiency. As an example of the scientific return of the morphological classification provided in the catalogue, we analyze the relation between morphology and the SFR-M∗ plane in this Section.
In Figure 1 we show the SFR-M∗ plane color coded by two of the classifications reported in the catalogue: PLTG and T-Type. The SFR and M∗ values are retrieved from the MPA-JHU Stellar mass catalogue444https://www.sdss4.org/dr17/spectro/galaxy_mpajhu (The Max Planck for Astrophysics and Johns Hopkins University groups), which provides galaxy properties for all DR8 galaxy spectra. Stellar masses are based on the galaxy photometry and are calculated using the Bayesian methodology and model grids described in Kauffmann et al. (2003). SFRs are computed within the galaxy fiber aperture (3\arcsec) using the nebular emission lines as described in Brinchmann et al. (2004). SFRs outside the fiber are estimated using the galaxy photometry following Salim et al. (2007). For AGN and galaxies with weak emission lines, SFRs are estimated from the photometry. There are 653,543 galaxies with reliable M∗ and SFR estimates (97% of the galaxies included in our morphological catalogue).
In the left panel of Figure 1, galaxies are color coded according to PLTG, i.e., the probability of a galaxy to be LTG rather than ETG. As expected, LTG galaxies are located in and above the main sequence (MS), while quenched galaxies show morphologies consistent with ETGs. A basic separation between elliptical/S0 and spirals is the most commonly classification reported in morphological catalogues.555Galaxy Zoo separates galaxies into ‘smooth’ or ‘features or disc’, which is usually used as a proxy for the separation between ETGs and LTGs. Note, however, that being ‘smooth’ is not equivalent to being ETG and the contamination of ‘smooth’ galaxies by LTGs can be significant - see Figure 15 from Domínguez Sánchez et al. (2022) While PLTG provides a broad separation between two classes, the T-Type parameter, corresponding to the Hubble sequence (or de Vaucouleurs type, de Vaucouleurs 1963) shows a more detailed and complex representation of the SFR-M∗ plane (right panel of Figure 1).
Galaxies with the lowest T-Types (T-Type 0, reddish colors) populate the high-mass and low SFR region (analogue to the red sequence in the color magnitude diagram) and the opposite happens for the galaxies with the largest T-Type values (T-Type 4, dark blue colors). Galaxies with intermediate T-Types populate the green valley but also the high-mass starburst region (above the MS) and the low-mass end of the quenched population. This is, to the best of our knowledge, the first time the SFR-M∗ is combined with T-Type information for such a large sample of galaxies. Note that no smoothing is applied to the figure, hence the underlying relation between mass, star formation efficiency and structure naturally emerges.
To shed more light on how the T-Type correlates with the SFR-M∗ loci, Figure 2 dissects the diagram in narrow T-Type bins. In addition, the upper panels separate elliptical (E) and lenticular (S0) galaxies according to their PS0 - we remind the reader that, although PS0 is reported for all the galaxies in the catalogue, it is only meaningful for galaxies with T-Type 0.
Several clear trends turn up from this novel representation. The four upper panels show the distribution of galaxies with T-Type 0 (corresponding to ETGs), divided into E (left) and S0 (right). These galaxies are the more massive and have the lowest SFRs, as expected. The contours are concentrated in a relatively narrow region ( 1 dex), which could be an analogue of the star forming MS for the quenched galaxies (QMS). It is worth noticing that the Es with T-Type=[-1,0] are less abundant than Es with T-Type=[-3,-1] and occupy a very narrow region in the SFR-M∗ plane, while the S0s expand over a wider SFR range.
Galaxies with intermediate T-Types (T-Type=[0, 2], green colours) expand through a large SFR range ( 3 dex) and show a bimodal distribution, with galaxies with T-Type=[0, 1] being more abundant in the low SFR region than galaxies with T-Type=[1, 2]. This could be interpreted as the existence of two distinct galaxy populations with Sa/Sab morphologies, or, alternatively, could be an indication that these galaxies are being quenched and we are witnessing their evolutionary tracks as they cross the green valley. More detailed studies regarding their ages and star formation histories should be carried out to support this statement beyond speculation.
Finally, galaxies with T-Types 2 (corresponding to Sb, Sc, Sd), are mostly located above the MS, with less and less galaxies below the MS as we move to lager T-Type values. There is also an evident shift towards lower masses and a narrowing of the location of the galaxies with increasing T-Type, with a slightly steeper slope than the MS. We remark that size, mass and colour played no role in the morphological classification, which was purely based on SDSS imaging.
4 Towards the classification of high redshift galaxies
The success of DL for classifying large samples of galaxies is undeniable. However, one of the main drawbacks of supervised deep learning is that they need large samples of labelled galaxies. In addition, they are strongly affected by domain shifts, whether caused by instrumental effects or by different parameter space distribution of the galaxy properties. This a big challenge for classifying high redshift galaxies, which are usually much fainter than their lower redshift counterparts.
One way to overcome the lack of a large training sample is the use of ‘transfer learning’, i.e., using the weights learned by a model for a particular data set for initializing the training with new data, rather than using a random initialization. In Domínguez Sánchez et al. (2019) we adapted the SDSS models to the DES data, demonstrating that this approach allows to reduce the size of the training sample by one order of magnitude. In Vega-Ferrero et al. (2021), we were able to classify galaxies much fainter (mr 22) than the ones with available labels (mr 17.7) by ‘emulating’ how the local galaxies would look like at higher redshifts, while keeping their original labels for training. The classifications where highly accurate (accuracy=97%) and their performance was consistent throughout all the magnitude range. The corresponding catalogue, including 27 million galaxies, was released with the paper and can be found here666https://des.ncsa.illinois.edu/releases/y3a2/gal-morphology. Unfortunatley, the image resolution was not enough for providing a T-Type classification and only allowed for a basic ETG/LTG separation and the identification of edge-on galaxies.
Alternative methods which do not require of labelled samples, such as self-supervised learning (e.g., Sarmiento et al. 2021) or Principal Component Analysis (e.g Tous et al. 2022) also provide valuable insights on galaxy properties. Finally, there are some tasks which still remain challenging for CNNs, for instance, the detection of low surface brightness structures like tidal features (see Domínguez Sánchez et al. 2023).
5 Conclusions
With these proceedings we release the morphological catalogue for the Meert et al. (2015) sample, based on the models presented in Domínguez Sánchez et al. (2022). The catalogue provides binary classifications (ETG vs LTG, E vs S0, edge-on, bars) and a T-Type for 670,000 galaxies, being the largest sample up to date with such detailed morphological properties. The scientific potential of the catalogue is illustrated by dissecting the SFR-M∗ plane in narrow T-Type bins. The results highlight the strong dependence of SFR and mass with galaxy structure and suggest that the SFR main sequence depends on morphology. We leave for forthcoming studies a more robust statistical analysis of this evidence. Other important relations, such as the Size-Mass relation, or the fundamental plane should be reviewed, dissecting galaxies according to their T-Types. Finally, the role of bars in secular evolution will surely benefit from such a large sample of barred galaxies, while the identification of edge-on galaxies can be useful for several scientific purposes, from estimating dust attenuation (Masters et al., 2010) to probing of self-interacting dark matter (Secco et al., 2018).
Acknowledgements.
HDS acknowledges the financial support by the PID2020-115098RJ-I00 grant from MCIN/AEI/10.13039/501100011033 and from the Spanish Ministry of Science and Innovation and the European Union - NextGenerationEU through the Recovery and Resilience Facility project ICTS-MRR-2021-03-CEFCA and by the PID2020-115098RJ-I00 grant from MCIN/AEI/10.13039/501100011033. MHC acknowledges financial support from the State Research Agency (AEI-MCINN) of the Spanish Ministry of Science and Innovation under the grant and “Galaxy Evolution with Artificial Intelligence” with reference PGC2018-100852-A-I00, from the ACIISI, Consejería de Economía, Conocimiento y Empleo del Gobierno de Canarias and the European Regional Development Fund (ERDF) under grant with reference PROID2020010057, and from IAC project P.301802, financed by the Ministry of Science and Innovation, through the State Budget and by the Canary Islands Department of Economy, Knowledge and Employment, through the Regional Budget of the Autonomous Community. The authors gratefully acknowledge the computer resources at Artemisa, funded by the European Union ERDF and Comunitat Valenciana as well as the technical support provided by the Instituto de Física Corpuscular, IFIC (CSIC-UV).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Abazajian et al. (2009) Abazajian, K. N., Adelman-Mc Carthy, J. K., Agüeros, M. A., et al. 2009, Ap JS, 182, 543
- 2Abdurro’uf et al. (2021) Abdurro’uf, Accetta, K., Aerts, C., et al. 2021, ar Xiv e-prints, ar Xiv:2112.02026
- 3Brinchmann et al. (2004) Brinchmann, J., Charlot, S., White, S. D. M., et al. 2004, MNRAS, 351, 1151
- 4Bundy et al. (2015) Bundy, K., Bershady, M. A., Law, D. R., et al. 2015, Ap J, 798, 7
- 5de Vaucouleurs (1963) de Vaucouleurs, G. 1963, Ap JS, 8, 31
- 6Domínguez Sánchez et al. (2019) Domínguez Sánchez, H., Bernardi, M., Brownstein, J. R., Drory, N., & Sheth, R. K. 2019, MNRAS, 489, 5612
- 7Domínguez Sánchez et al. (2018) Domínguez Sánchez, H., Huertas-Company, M., Bernardi, M., Tuccillo, D., & Fischer, J. L. 2018, MNRAS, 476, 3661
- 8Domínguez Sánchez et al. (2022) Domínguez Sánchez, H., Margalef, B., Bernardi, M., & Huertas-Company, M. 2022, MNRAS, 509, 4024
