Uncertain Photometric Redshifts with Deep Learning Methods

Antonio D'Isanto

arXiv:1703.01979·astro-ph.IM·June 14, 2017·Astroinformatics

Uncertain Photometric Redshifts with Deep Learning Methods

Antonio D'Isanto

PDF

TL;DR

This paper introduces a deep learning approach combining Mixture Density Networks and Deep Convolutional Networks to estimate accurate, multimodal photometric redshift probability density functions, improving efficiency over traditional spectroscopic methods.

Contribution

It presents a novel deep learning framework for photometric redshift estimation that models multimodal PDFs, outperforming traditional machine learning methods like Random Forests.

Findings

01

Deep learning models effectively estimate multimodal photo-z PDFs.

02

The proposed method outperforms Random Forests in accuracy.

03

The approach enhances efficiency in astronomical redshift estimation.

Abstract

The need for accurate photometric redshifts estimation is a topic that has fundamental importance in Astronomy, due to the necessity of efficiently obtaining redshift information without the need of spectroscopic analysis. We propose a method for determining accurate multimodal photo-z probability density functions (PDFs) using Mixture Density Networks (MDN) and Deep Convolutional Networks (DCN). A comparison with a Random Forest (RF) is performed.

Tables1

Table 1. Table 1: DCMDN architecture

#	Type	Size	Maps	Activ
1	input	28x28	/	/
2	Conv	3x3	256	tanh
3	Pool	2x2	256	tanh
4	Conv	2x2	512	tanh
5	Pool	2x2	512	tanh
6	Conv	3x3	512	ReLu
7	Conv	2x2	1024	ReLu
8	MDN	500	/	tanh
9	MDN	100	/	tanh
10	output	15	/	Eq. 1

Equations2

μ_{j} = z_{j}^{μ}, σ_{j} = exp (z_{j}^{σ}), ω_{j} = \frac{exp ( z _{j}^{ω} )}{\sum _{i = 1}^{n} exp ( z _{i}^{ω} )} .

μ_{j} = z_{j}^{μ}, σ_{j} = exp (z_{j}^{σ}), ω_{j} = \frac{exp ( z _{j}^{ω} )}{\sum _{i = 1}^{n} exp ( z _{i}^{ω} )} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Uncertain Photometric Redshifts with Deep Learning Methods

A. D’Isanto1 The author gratefully acknowledges the support of the Klaus Tschira Foundation. 1Heidelberg Institute for Theoretical Studies (HITS)

Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg - GERMANY

email: [email protected]

(2016)

Abstract

The need for accurate photometric redshifts estimation is a topic that has fundamental importance in Astronomy, due to the necessity of efficiently obtaining redshift information without the need of spectroscopic analysis. We propose a method for determining accurate multi-modal photo-z probability density functions (PDFs) using Mixture Density Networks (MDN) and Deep Convolutional Networks (DCN). A comparison with a Random Forest (RF) is performed.

keywords:

techniques: galaxies: distances and redshifts, photometric, methods: data analysis, surveys, (galaxies:) quasars: general etc.

††volume: 325††journal: Astroinformatics 2016††editors: A.C. Editor, B.D. Editor & C.E. Editor, eds.

1 Introduction

Determination of distances for astronomical objects through redshift acquired in the recent years an increasingly importance, having a fundamental role in cosmological research. In fact, it is well known that redshift is a fundamental step of the cosmic distance ladder. Redshift is traditionally obtained through spectroscopic analysis but due to long integration times and costly instrumentation requirements, it is not possible to measure it for all objects. Therefore, a convenient alternative is the estimation of photometric redshifts, e.g. based on measurements of pure photometry. However, the uncertainty of such an approach is much higher than the measurement errors obtained from spectroscopy. For this reason, the astronomical community has focused in the uncertainty quantification of redshift estimates through probability density functions (PDFs), instead of using simple point estimates. In this work we propose two neural network models based on Mixture Density Networks (MDN) ([Bishop, C. M. 1994, Bishop 1994]). We use a deep MDN as first architecture, designed to use photometric features as inputs and to generate PDFs. The second architecture is a combination of a Deep Convolutional Network (DCN) ([LeCun, Y. et al. 1998, LeCun et al. 1998]) with a MDN with the purpose to obtain photo-z PDFs based on images as input. We will show that this approach achieves better predictions due to its use of image data that - in contrast to using pre-defined features - allows to capture more details of the objects. We compare the results obtained with a commonly used tool in the related literature, the Random Forest (RF) ([Breiman, L. 2001, Breiman 2001]).

2 Deep learning algorithms

In the next two subsections we give a description of the deep learning algorithms used for the experiments.

2.1 Mixture Density Network

A Mixture Density Network ([Bishop, C. M. 1994, Bishop 1994]) is the combination of a feed-forward neural network and a Gaussian mixture model. The outputs of the network parametrize the Gaussian mixture $p(\theta|x)=\sum_{j=1}^{n}\omega_{j}\mathcal{N}(\mu_{j},\sigma_{j})$ , i.e. they define the means, variances, and weights. Thus the MDN produces a multi-modal PDF suitable for the case of photo-z, which a flexible enough to represent a multi-modal behavior. The means, variances and weights, are then obtained by the outputs $z$ of the network:

[TABLE]

Normally the MDN uses negative log-likelihood as a loss function, but in this work we use the continuous rank probability score (CRPS) ([Gneiting, T. et al. (2005), Gneiting et al. 2005]) as loss function. This is to obtain a trained MDN which produces PDFs both well calibrated and sharp as measured by the CRPS, as explained in detail in [Polsterer, K. L. et al. (2016), Polsterer et al. (2016)].

2.2 Deep Convolutional Network

A Deep Convolutional Network is a model in which several convolutional and sub-sampling layers are coupled with a fully-connected network. This architecture is particularly meant to learn from raw image data. In our case, we want to estimate redshifts directly from images, without the need to extract photometric features, so we couple a DCN with a MDN, in order to produce photo-z PDFs directly from SDSS images. We alternate convolutional and pooling layers to generate feature maps and generate a hierarchically compressed representation of the input data. The output of the convolutional network is then taken as input for the MDN which produces a multi-modal predictive density for photo-z. Thereby the extraction of the feature maps is automatically done by the network. Those obtained feature maps are then taken as inputs for the fully-connected part. We choose a modified version of the LeNet-5 architecture ([LeCun, Y. et al. 1998, LeCun et al. 1998]), properly coupled with the presented MDN (see Section 2.1), obtaining what we call a Deep Convolutional Mixture Density Network (DCMDN). In Tab. 1 there is the architecture of the DCMDN used for the experiments, designed to run on GPUs, using a cluster equipped with Nvidia Titan X.

3 Experiments and Analysis

The data we use for the experiments are taken from the Sloan Digital Sky Survey Quasar Catalog V ([Richards, G. T. et al. (2010), Richards et al. 2010]), based on the 7-th data release of the Sloan Digital Sky Survey (SDSS), consisting in $105,783$ spectroscopically confirmed quasars, in a redshift range between $0.065$ and $5.46$ . For the experiments we use a random subsample of $50,000$ patterns. For each pattern we take the five ugriz magnitudes as input features and the respective images in the same bands. Finally, we compare the performances of MDN and DCMDN with the widely used RF.

The RF, in its original architecture, is not meant to produce PDFs. In order to obtain a distribution, we first collect the predictions $z_{t,n}$ of each individual decision tree $t$ in the forest, for every $n$ -th data item. We take $T=256$ number of trees in the forest and define the PDF for the RF by fitting a mixture of 5 Gaussian components to the outputs, $p(\theta|x)=\sum_{j=1}^{5}\omega_{j}\mathcal{N}(\theta|(\mu_{j},\sigma_{j}))$ , as we described also in Section 2.1 for the MDN.

For the RF and the MDN we use as input the 5 magnitudes plus all the possible color combinations, obtaining a 15-dimensional feature vector, respectively. The generated training and test set both contain $25,000$ patterns. The DCMDN is trained on the images, that are obtained using the Hierarchical Progressive Surveys data partitioning format ([Fernique, P. et al. (2015), Fernique et al. 2015]) and performing a proper cutout on client side, in order to obtain the desired dimensions (28x28). Each pattern is originally a stack of 5 images in the ugriz filters, where every pixel is converted from flux units to luptitudes ([Lupton, R. H. et al. (1999), Lupton et al. 1999]). As done with the usual features, we additionally form the color images from the ugriz images by taking all possible pairwise differences, thus obtaining a stack of 15 images; every object/pattern is then represented by a tensor of dimensions 15x28x28. In order to have a rotational invariant network, we perform data augmentation, taking rotations of each image at 0, 90, 180, 270 degrees. By doing so, we obtain a training set of $100,000$ images, a validation set of $50,000$ images and a test set of $50,000$ images. Dropout is also applied to limit overfitting.

The results of the experiments are reported in Fig. 1. Following [Polsterer, K. L. et al. (2016), Polsterer et al. (2016)], we use two statistical tools: the CRPS as a score function, and the probability integral transform (PIT) histogram ([Gneiting, T. et al. (2005), Gneiting et al. 2005]), in order to obtain a visual estimation of the quality of the produced PDFs. In the RF experiment, the model reaches a CRPS of 0.20 and the PIT shows a bit of overdispersion. The performance of the MDN is a bit worse than the RF in terms of the CRPS, with a score of 0.21, but exhibits a better calibrated PIT. Using the DCMDN architecture we achieve the best results in terms of the CRPS, with a score of 0.19. The resulting PIT is acceptable, although it is still showing some underdispersion. The reason for the better overall performance of the DCMDN is that the features-based approach use only a fraction of the available information. In fact, in the process of features extraction a lot of information gets lost. Instead, using images, the DCMDN is able to automatically determine thousands of features, leading to a better prediction of the photo-z PDFs.

4 Conclusions

Main purpose of this work is to show a method to produce photo-z PDFs using deep learning architectures. We generate very good probabilistic predictions based on features or images as input, producing a Gaussian mixture model as output. Our proposed architectures show better performances in the comparison carried out with a RF based method. In particular, we show that the proposed DCMDN gives the best performance, as it is able to use the entire information contained in the images. As showed by the PIT analysis, some optimization with respect to calibration can still be done, in order to deal with some dispersion phenomena. We firmly believe that the presented method needs little improvements to become a standard in predicting photo-z PDFs. As regression problems are very common in Astronomy, this approach can easily be applied to many other scientific topics.

Bibliography8

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Bishop, C. M. 1994] Christopher M. Bishop. Mixture density networks. Technical report, 1994.
2[Le Cun, Y. et al. 1998] Y. Le Cun, L. Bottou, Y. bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE , 86(11):2278-2324, November 1998
3[Breiman, L. 2001] Leo Breiman. Random forests. Mach. Learn. , 45(1):5-32, October 2001.
4[Gneiting, T. et al. (2005)] T. Gneiting, A. E. Raftery, A. H. Westveld, and T. Goldman. Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation. Monthly Weather Review , 133:1098, 2005
5[Polsterer, K. L. et al. (2016)] K. L. Polsterer, A. D’Isanto, and F. Gieseke. Uncertain photometric redshifts. 2016
6[Richards, G. T. et al. (2010)] G. T. Richards, P. B. Hall, D. P. Schneider, et al., Vizie R Online Data Catalog: The SDSS-DR 7 quasar catalog (Schneider+, 2010). Vizie R Online Data Catalog , 7260, May 2010
7[Fernique, P. et al. (2015)] P. Fernique, M. G. Allen, et al. Hierarchical progressive surveys. Multi-resolution HEAL Pix data structures for astronomical images, catalogues, and 3-dimensional data cubes. A&A , 578:A 114, June 2015.
8[Lupton, R. H. et al. (1999)] R. H. Lupton, J. E. Gunn, and A. S. Szalay. A Modified Magnitude System that Produces Well-Behaved Magnitudes, Colors, and Errors Even for Low Signal-to-Noise Ratio Measurements. AJ , 118:1406-1410, September 1999.