X-ray Study of Spatial Structures in Tycho's Supernova Remnant Using Unsupervised Deep Learning
Hiroyoshi Iwasaki, Yuto Ichinohe, and Yasunobu Uchiyama

TL;DR
This paper demonstrates an unsupervised deep learning approach using a variational autoencoder and Gaussian mixture model to automatically identify spatial structures in X-ray data from Tycho's supernova remnant, reducing manual analysis effort.
Contribution
It introduces a novel unsupervised machine learning method combining VAE and GMM for spatial structure recognition in astronomical X-ray data, enabling efficient analysis without detailed spectral examination.
Findings
Successfully recognized characteristic structures like the iron knot using spectral data
Reduced human effort in analyzing complex diffuse astronomical objects
Showed potential for selecting regions for detailed spectral analysis
Abstract
Recent rapid development of deep learning algorithms, which can implicitly capture structures in high-dimensional data, opens a new chapter in astronomical data analysis. We report here a new implementation of deep learning techniques for X-ray analysis. We apply a variational autoencoder (VAE) using a deep neural network for spatio-spectral analysis of data obtained by Chandra X-ray Observatory from Tycho's supernova remnant (SNR). We established an unsupervised learning method combining the VAE and a Gaussian mixture model (GMM), where the dimensions of the observed spectral data are reduced by the VAE, and clustering in feature space is performed by the GMM. We found that some characteristic spatial structures, such as the iron knot on the eastern rim, can be automatically recognised by this method, which uses only spectral properties. This result shows that unsupervised machine…
| Category No. | Location | Feature |
|---|---|---|
| 0 | outside of SNR | very dark, background |
| 1 | inside of SNR | dark region, between CD and FS |
| 2–4 | rim of ejecta | bright ejecta |
| 5 | NW rim | weak Fe emission |
| 6 | blobs in the Fe knot | strong Fe line emission |
| 7 | rim, filaments | power-law radiation dominant |
| Region | ||||||||
| 0a | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
| Fe K lines | ||||||||
| Centroid (keV) | – | |||||||
| Width (keV) | – | |||||||
| Normalisationb | – | |||||||
| Line Flux Ratio (%) | ||||||||
| Si He/Si He | ||||||||
| Si Ly/Si He | ||||||||
| S He/S He | ||||||||
| S Ly/S He | ||||||||
| K Line Flux Ratio (%) | ||||||||
| S He/Si He | ||||||||
| Ar He/Si He | ||||||||
| Ca He/Si He | ||||||||
| Fe K/Si He | – | |||||||
| Region | |||||
| 1 | 2 | 4 | 6 | 7 | |
| Power Law | |||||
| Photon Index | |||||
| Surface Brightness a | |||||
| Fe K Lines | |||||
| Centroid (keV) | – | ||||
| Width (keV) | – | ||||
| Surface Brightness a | – | ||||
| K Line Flux Ratio (%) | |||||
| S He/Si He | |||||
| Ar He/Si He | |||||
| Ca He/Si He | |||||
| Fe K/Si He | – | ||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
X-ray Study of Spatial Structures in Tycho’s Supernova Remnant Using Unsupervised Deep Learning
Hiroyoshi Iwasaki,1,2 Yuto Ichinohe,1,2 and Yasunobu Uchiyama,1,2
1Department of Physics, Rikkyo University, 3-34-1 Nishi Ikebukuro, Toshima-ku, Tokyo 171-8501, Japan
2Bluish AI Laboratory, 7-11-3 Ginza, Chuo-ku, Tokyo 104-0061, Japan E-mail: [email protected] (HI)
(Accepted XXX. Received YYY; in original form ZZZ)
Abstract
Recent rapid development of deep learning algorithms, which can implicitly capture structures in high-dimensional data, opens a new chapter in astronomical data analysis. We report here a new implementation of deep learning techniques for X-ray analysis. We apply a variational autoencoder (VAE) using a deep neural network for spatio-spectral analysis of data obtained by Chandra X-ray Observatory from Tycho’s supernova remnant (SNR). We established an unsupervised learning method combining the VAE and a Gaussian mixture model (GMM), where the dimensions of the observed spectral data are reduced by the VAE, and clustering in feature space is performed by the GMM. We found that some characteristic spatial structures, such as the iron knot on the eastern rim, can be automatically recognised by this method, which uses only spectral properties. This result shows that unsupervised machine learning can be useful for extracting characteristic spatial structures from spectral information in observational data (without detailed spectral analysis), which would reduce human-intensive preprocessing costs for understanding fine structures in diffuse astronomical objects, e.g., SNRs or clusters of galaxies. Such data-driven analysis can be used to select regions from which to extract spectra for detailed analysis and help us make the best use of the large amount of spectral data available currently and arriving in the coming decades.
keywords:
ISM: individual (SN 1572–Tycho’s SNR) – ISM: supernova remnants – X-rays: ISM – method: statistical
††pubyear: 2019††pagerange: X-ray Study of Spatial Structures in Tycho’s Supernova Remnant Using Unsupervised Deep Learning–X-ray Study of Spatial Structures in Tycho’s Supernova Remnant Using Unsupervised Deep Learning
1 Introduction
In the past decade, machine learning, especially deep learning, has occupied an important position in data science because of its rapid development and high versatility. It has the potential to assist in analysis of astronomical data and to extract important information from rich astronomical data without human bias.
Astronomical observations produce complex multidimensional data that include spatial, temporal, and spectral information. For example, X-ray data obtained from a single observation of a diffuse source, e.g., a supernova remnant (SNR), may contain all these types of information. Because of this complexity, conventional analyses can be prone to human bias and oversight. In the near future, a dramatic improvement in the energy resolution of X-ray observations is expected; e.g., XRISM will have an energy resolution of several electron volts (Tashiro et al., 2018), and Athena will have a spectral resolution of 2.5 eV up to 7 keV at a spatial resolution of 5 arcsec with 4000 pixels (Barret et al., 2018). Detailed analysis of such data may require excessive human resources. Thus, automatic and unbiased methods to discover features and pre-analyse the data are required to exploit the full potential of upcoming instruments.
To take advantage of the rich data contained in SNR observations and to extract essential information without human bias, various machine learning techniques have been explored. Warren et al. (2005) and Warren (2006) demonstrated the separation of mostly featureless and line-dominated emission from Tycho’s SNR using a linear dimensionality reduction method, principal component analysis (PCA). They extracted 12 new axes from 12 broad spectral channels and found that the image of the first principal axis (PC1) corresponds to the contrast between Si- and Fe-rich emission and the hard continuum emission. Sato & Hughes (2017) also performed PCA of the narrow-band spectra of Tycho’s SNR, separating the Si He band into 18 bins, and found that the first three PCs correspond to the line equivalent width, line energy centroid, and line energy width, respectively. Burkey et al. (2013) demonstrated clustering of four-band line fluxes extracted from 5000 spatial regions of Kepler’s SNR using a Gaussian mixture model (GMM) and identified the shocked circumstellar medium (CSM) region.
Most previous applications of machine learning techniques to analysis of SNR data have been limited to linear methods. However, the value of each spectral bin depends nonlinearly on the underlying physical parameters; e.g., the bremsstrahlung continuum emission is exponentially affected by the plasma temperature. In other words, the data space of X-ray spectra is not flat. Although PCA, which linearly transforms the data into another orthogonal expression, might provide approximate results in some cases, it is reasonable to choose a model capable of expressing nonlinear relations when the problem exhibits such features. Deep neural networks (DNNs) are likely to obtain more effective expressions from data spaces than linear methods because of their ability to handle nonlinear relations. In this research, we examine the potential of DNNs to extract features embedded in nonlinear relations.
An artificial neural network (ANN), which consists of several layers of multiple formal neurons, mimics the functioning of animal brains (e.g., LeCun et al., 2015) and can implicitly capture features embedded in high-dimensional data. Each neuron transforms its input as , where and are tuneable weight parameters, and is a nonlinear function, and the processing layers learn representations of data with multiple levels of abstraction. Multilayer ANN architectures, such as the multilayer perceptron, can reveal complex, nonlinear relations.
Recent computational advances have made it possible to train DNNs at a reasonable time and cost, and such techniques have become very popular in many areas, including analyses of astrophysical images (Hezaveh et al., 2017; Kimura et al., 2017), spectra (Ichinohe et al., 2018; Ichinohe & Yamada, 2019), light curves (Charnock & Moss, 2017; Shallue & Vanderburg, 2018), and telescope events (Shilon et al., 2018).
This paper is organised as follows. In Section 2, we describe our machine learning method. Section 3 presents our machine learning results and interpretation of the features classified by the method. In Section 4, we analyse the two regions selected by our method in detail. Section 5 discusses the method’s efficiency and concludes our paper.
2 Machine Learning Methods
In this work, we aim to categorise individual spatial bins of an X-ray image using only the spectral properties. We developed a method that allows us to combine two unsupervised learning methods, the variational autoencoder (VAE; Kingma & Welling, 2013) and the GMM, for spatially resolved spectroscopy of the X-ray data obtained by Chandra from Tycho’s SNR.
Similar methods in which dimensionality reduction and unsupervised clustering are integrated have already been applied in several astronomical analyses, e.g., classification of galaxy spectra (PCA & GMM; Hurley et al., 2012), classification of supernova (autoencoder (AE), isomap & k-means; Ishida et al., 2017) or separation of galactic and extragalactic objects (AE & support vector machine (SVM); Khramtsov & Akhmetov, 2018). In this paper, we explore the method of combining nonlinear dimensionality reduction by a VAE and clustering by a GMM for automatic investigation of the spatial structures of a diffuse object for the first time.
2.1 Variational Autoencoder
Although an X-ray spectrum of, e.g., an SNR, may have a large number of energy bins, they are not completely independent of each other because of the finite instrumental energy resolution and underlying emission process. The number of essential parameters generating the spectrum might be small (e.g., APEC, an emission model for plasma in collisional ionization equilibrium, has only four parameters). Therefore, before classifying the spectrum, we reduced the dimensionality by compressing the information in the raw input spectra. To capture nonlinear relationships between the essential parameters, we employed an unsupervised DNN architecture, namely, a VAE.
An AE is a DNN architecture connecting an encoder and a decoder. The encoder is trained to encode the input data as latent variables by reducing the input dimensions. At the same time, the decoder is trained to reproduce the original input from the latent variables . The dimension of the latent space is smaller than that of the original space. One can thus obtain a compressed latent expression of the original data if the AE is successfully trained. The latent space is expected to capture nonlinear relationships in the input data because of the capabilities of DNNs.
The VAE is a variant of the AE. In the VAE, a multidimensional Gaussian distribution is assumed for the latent variables. Unlike a normal AE, which computes the latent variables directly, the encoder of a VAE computes the means and variances . A set of latent variables is sampled from a multidimensional Gaussian whose means and variances are calculated by the encoder. The decoder decompresses the set of latent variables .
VAE models have some advantages over normal AEs, e.g., more stable training and a better latent manifold structure (Tolstikhin et al., 2017). Before the study, we compared normal AEs to the VAE and found that training of the normal AE is sometimes unstable and that in many cases, some latent variables do not represent any features. Therefore, we chose the VAE for this study.
Figure 1 shows a diagram of the VAE architecture. The encoder and decoder were constructed from two fully connected (FC) neural network layers with 256 nodes per layer and rectified linear unit (ReLU; Nair & Hinton, 2010) activation.
The encoder branches after the FC layers and connects to a layer of and , which have the same number of nodes as . The latent variables are sampled as 111Here, is the operator of the element-wise product. using the reparametrization trick (Kingma & Welling, 2013; Jimenez Rezende et al., 2014). Here is a vector consisting of random numbers sampled from a Gaussian distribution defined by hyperparameters; the mean is 0, and the variance is 1. The decoder consists of two layers; the output layer has sigmoid activation and the same dimensions as the input layer.
We used the deep learning framework Keras 2.0.7 (Chollet et al., 2015) with the Tensorflow 1.3.0 (Abadi et al., 2015) backend. The sum of the binary cross-entropy and KL divergence was used as the loss function. Nesterov-accelerated adaptive moment estimation (Nadam; Dozat, 2016) (which, according to our tests, provides the fastest convergence of the optimisers) was used for optimisation. The training was performed for 100 epochs with a batch size of 100.
Training typically takes 16–20 min on a workstation with an eight-core Intel Xeon E5 CPU. We also tested the training of a model using GPU computations on an NVIDIA GeForce GTX 1080Ti graphics card. Training on the GPU with a batch size of 4096 typically ran for 40 s and required approximately 300 MiB of GPU memory.
2.2 Gaussian Mixture Model
After the spectral information was compressed into a several-dimensional latent expression using the VAE as explained in the previous section, we classified it using the GMM.
There are two major types of clustering methods: (1) hard clustering methods, in which each data point is assigned to only one category (e.g., k-means, SVM), and (2) soft clustering methods, which assign each point to all the categories with different weights (e.g., the GMM).
A soft clustering method is appropriate for this study for the following reasons. (1) In the latent variable coded by the VAE, the characteristic distribution is not necessarily separate, and components with apparently different trends (see Fig. 4) overlap considerably, particularly around the value 0. It is generally difficult to draw clear boundaries when multiple components overlap. (2) The physical conditions change continuously throughout the SNR.
The GMM is a well-known soft clustering method that has become popular for astrophysical data analysis (e.g., Davoodi et al., 2006; Hurley et al., 2012; Burkey et al., 2013). It describes the data distribution as multidimensional Gaussians; each Gaussian represents a clustering category. Every data point is represented by a weighted superposition of all the categories. The probability that a data point belongs to a certain category, which is also referred to as the responsibility, is represented by the ratio of the value of the Gaussian corresponding to the category to the sum of the values of all the Gaussians for this data point. We used the GMM GaussianMixture in scikit-learn 0.19.0 (Pedregosa et al., 2011), which is a Python library providing a machine learning framework.
3 Demonstration
3.1 Demonstration Target
X-ray spectra from SNRs are known to contain rich multidimensional information. We chose Tycho’s SNR as a target for application of our machine learning method because this is one of the best studied SNRs.
Supernovae (SNe) explosively eject elements synthesised in the progenitor materials (so-called ejecta), forming blast waves. The resulting bright X-ray-emitting structures are called SNRs. X-ray observations of SNRs allow us to investigate both the chemical evolution of the Universe and the mechanisms of cosmic-ray acceleration.
SNe have long been assumed to supply heavy elements synthesised during the explosion. The ejecta, that is, the X-ray-emitting hot plasma in SNRs, reveals the nuclear burning regimes of elemental synthesis.
Tycho’s SNR is the remnant of SN 1572, which is known to be a type Ia explosion from the light-echo spectrum (Krause et al., 2008). In X-ray spectra of Tycho’s SNR, line emission from intermediate-mass elements (IMEs; e.g., Si, S, Ar, and Ca) and Fe synthesised during the supernova explosion are clearly seen. In addition, secondary Fe-peak elements (e.g., Cr, Mn, and Ni, which are synthesised together with Fe) have been detected (e.g., Tamagawa et al., 2009). The global morphology of Tycho’s SNR features radial gradation of the plasma ionization state and the electron or ion temperature, which are caused by reverse shock (RS) heating. The gradation features appear as differences in the peak radii of the emission lines and are seen especially clearly in the northwestern (NW) projected ejecta limb.
X-ray imaging using ASCA showed that the Fe K emission clearly peaks at a smaller radius than the Fe L and IME line emission and that the Fe-K-emitting plasma was hotter and less ionized (Hwang & Gotthelf, 1997; Hwang et al., 1998). Warren et al. (2005) measured the averaged RS radius as 183 arcsec using the Fe K lines from a Chandra observation. Yamaguchi et al. (2014) showed electron heating at the RS on the NW limb using the Fe K and K lines. They also measured the RS radius as 158 arcsec using the Fe K lines of immediate postshock, low-ionization ejecta, which peak at a smaller radius than Fe K emission from a relatively highly ionized component.
Lu et al. (2015) used Chandra observations to show a systematic increase in the S-to-Si line flux ratio with increasing radius resulting from RS propagation in the ejecta and reported the elapsed ionization time since the ejecta was shock-heated. Sato & Hughes (2017) also found a gradual increase in the line centroids of Fe K beyond the radius and interpreted it as a difference in the elapsed ionization time.
By contrast, the eastern region exhibits an unusual morphological structure called the Fe knot, where several clumps outrun the forward shock (FS). Detailed analysis of Suzaku and Chandra data suggests that the Fe knot did not originate in the deep, dense core of the progenitor white dwarf but was instead synthesised under incomplete Si burning or the -rich freeze-out regime (Yamaguchi et al., 2017).
In addition, galactic SNRs are widely believed to supply cosmic rays up to the ‘knee’ energy of the cosmic-ray spectrum at eV, accelerating particles to relativistic energies in their blast waves by diffusive shock acceleration. The accelerated electrons emit the nonthermal X-ray synchrotron emission observed from the limbs of young SNRs (e.g., Koyama et al., 1995; Eriksen et al., 2011).
The X-ray synchrotron emission from electrons accelerated at the FS was observed from the limb of Tycho’s SNR (Hwang et al., 2002). Cosmic-ray proton acceleration at the FS was also reported (Warren et al., 2005). Eriksen et al. (2011) found nonthermal stripes in the projected interior of the remnants and interpreted them as evidence for particle acceleration to the ‘knee’ energy in regions of enhanced magnetic turbulence.
Tycho’s SNR, which is one of the brightest SNRs in the X-ray band and has various interior structures, is one of the best benchmark objects for testing a new analysis method. High spatial–spectral-resolution data from Tycho’s SNR were obtained by Chandra. In this research, we apply our method to the X-ray data from Tycho’s SNR to investigate the morphological structures without human bias by automatic classification of each spatial point based only on the physical features reflected in the spectrum.
3.2 Data Set
Tycho’s SNR was observed by the Advanced CCD Imaging Spectrometer (ACIS)-I of Chandra for 145.6, 142.1 (two obsIDs), 734.1 (nine obsIDs), and 146.98 ks in 2003, 2007, 2009, and 2015, respectively. In 2007 and 2009, there were one and five observations, respectively, with exposure times exceeding 80 ks. We performed X-ray analysis using CIAO (version 4.9) and CalDB (version 4.7.6) provided by the Chandra X-ray Center222Available at http://cxc.harvard.edu.
Finer spectral binning is expected to preserve more information such as the line width, line-centroid shift, and composition of weak lines. However, finer binning results in lower counts in each spectral bin. We employed an objective method of spectral binning to achieve fine binning and adequate photon statistics in each bin at the same time. As shown in Fig. 2, the spectrum of the entire Tycho’s SNR was created in the 0.5–7 keV band and was divided into 37 narrower energy bins such that each of them had a count rate of more than 100 , including the background. The low-energy side (0.5–2.6 keV), with a high count rate, is divided into rather narrow bands (bandwidth/band centroid energy 8%). For example, the Si He emission line (1.69–2.01 keV) is divided into 11 bins. On the other hand, the high-energy part of the spectrum (2.6–7.0 keV) is divided into four wider bands because the statistics are not as good as at lower energies. The flux values in the 37 narrow energy bands corresponding to a single spatial bin are combined, and the resulting 37-dimensional vectors are used as the input data set.
The flux image of each band was created with the coordinate ranges set to omit regions outside of the SNR. The spatial bin size was set to 3.94 arcsec, resulting in an image size of spatial bins. We did not subtract the backgrounds from the images because most of Tycho’s SNR is sufficiently bright that we can safely ignore the contributions from the non-X-ray background and cosmic X-ray background between 0.5 and 7.0 keV.
The averaged expansion velocity of Tycho’s SNR is approximately (Katsuda et al., 2010), which is significant for the entire data set. Thus, in our analysis we do not mix observations from different years. Eight individual observations with exposure times exceeding 80 ks in 2003, 2007, 2009, and 2015 were used for training. In addition, we also used the shorter observation taken in 2007 by co-adding it with the longer one taken in the same year. Eighty percent of the spatial bins in each flux image were chosen randomly and used as training data, and the rest were used as evaluation data. The actual size of the training and evaluation data sets were 150,781 and 36,808, respectively, excluding the spatial bins with zero flux in all the narrow energy bands (i.e., a 37-dimensional zero vector). All the observations from 2009 (a year which has the longest total exposure) were summed and used for the post-training analysis.
3.3 Unsupervised Dimension Reduction and Clustering
We extracted the latent expressions from the data of Tycho’s SNR observed in 2009 using the encoder of the trained VAE. Each panel in Fig. 3 shows the four-dimensional coordinates of the latent parameters that are obtained by the VAE from a merged set of all the observations from 2009. The images for each axis of have the same colour scale. We determined the latent dimension as follows. We prepared models with latent dimensions ranging from 2 to 10 and trained them for 100 epochs. We selected the latent dimension whose corresponding loss value for the validation data was the lowest.
We applied GMM soft clustering to the obtained latent expressions. The optimal number was determined by checking the Bayesian information criterion for three to nine of the used clusters. There was little difference between cases with seven to nine clusters; thus, for the analysis we used eight clusters. To visualise the features extracted by the VAE, we plot clusters in different colours in the scatter plot shown in Fig. 4.
The scatter plots in Fig. 4 show the distributions of the latent variables, which are colour-coded according to the category assigned by GMM clustering, projected onto all six (=, where 4 is the latent dimension) different two-dimensional planes passing through the origin. In the latent space, the compressed expressions form a radial distribution consisting of several branches around 0. The main thick branch is classified as categories 1–4, whereas the three branches extending in different directions are classified as categories 5, 6, and 7, respectively. The data points around 0 are classified as category 0.
Each panel of Fig. 5 shows the responsibility of each GMM category. The middle panel of Fig. 6 shows the division of Tycho’s SNR into GMM categories. The colours of the spatial bins correspond to the highest responsibility category, as obtained by the method when the merged data from 2009 were used. The spatial bins of each category have a spatially coherent distribution.
The right panel of Fig. 6 shows the same image as the middle panel of Fig. 6, but the spatial bins whose assigned category has a responsibility of 90% are masked. Thus, the spatial bins that remain coloured in the right panel of Fig. 6 are robustly assigned to some category, and thus are expected to have some spectral features distinct from those of the other categories.
For comparison, a traditional RGB image is shown in the left panel of Fig. 6. Some regions that appear similar in the RGB image are assigned to different categories. For example, the blob on the eastern rim (region marked ‘c’ in Fig. 6) and the annular layer to the northwest (inner layer region marked ‘a’ in Fig. 6) both appear reddish in the RGB image (Fig. 6, left panel), although they are assigned to different categories in the GMM image (categories 3 and 6; see the middle panel of Fig. 6).
We also note that the clusters corresponding to the layered structure in the NW part of the SNR (for detailed analysis, see Section 4.1) are revealed by clustering for any number of Gaussians between seven and nine. On the other hand, the regions dominated by featureless emission and the Fe knot described in Section 4.2 are separated into two clusters only when eight or nine categories are assumed.
3.4 Detailed Results of Clustering
On the basis of the GMM classification, we extracted the representative spectra of each category by combining all the spatial bins assigned to a certain category with responsibilities above 90%. The combined spectra are shown in Fig. 7. The background was extracted from an annular region surrounding the SNR and subtracted from the spectra.
Table 1 summarises the physical interpretation of each category. Category 0 is localised mainly outside of Tycho’s SNR and also contains some dark regions inside the SNR. Categories 1–5, the main regions of which coincide spatially with the ejecta, form a layered structure, and the assigned category numbers, 1–5, change from the inner side of the SNR to the outer side. Category 1 represents faint regions inside of the SNR, which mainly include the unshocked ejecta in projection and the swept-up interstellar medium (ISM)/CSM between the FS and the contact discontinuity (CD). Most of categories 2–5 appears as layered structure, which is most clearly seen in the northern part of the SNR (region marked ‘a’ in Fig. 6), especially category 5, which has a responsibility above 90% and is located only in the NW region of the SNR. Two blobs of categories 4 and 5 close to the SNR centre (marked ‘b’ in Fig. 6) coincide with regions with a measured blue shift (Sato & Hughes, 2017). Thus, categories 4 and 5 are interpretable as ejecta limbs.
Category 6 is localised at the edge in the eastern part of the SNR (region ‘c’ in Fig. 6), which is associated with the reddish region in the left panel of Fig. 6. This region is a substructure in the Fe knot analysed by Yamaguchi et al. (2017) in detail. The spectrum of category 6 has strong Fe line emission, although the IME emission is weaker in Fig. 7. As shown in the bottom panel of Fig. 8, category 6 covers a portion of the ejecta having the strongest Fe K line in the SNR.
Category 7 corresponds spatially to the FS at the edge of the SNR, the filament and stripe structure at ‘d’ inside the SNR on the western side, and the bright arc at ‘e’ in the SNR on the southeast in Fig. 6. These structures are associated with bluish regions in the left panel of Fig. 6. In the spectrum of category 7 in Fig. 7, continuum emission is dominant, although weak contamination by line emission (Si, S, and so on) appears.
The medium- and high-energy parts of the spectra in Fig. 7 have clear line emission features. To investigate the spectral properties quantitatively, we fitted the spectra with a model of an absorbed power law for the continuum emission plus Gaussians for the emission lines of He, Ly, He, He, and Ly of Si, S, Ar, and Ca (excluding Ca Ly) and Fe K. The line centroids, widths, and intensities of weak lines such as Lyman lines and He were tied to other strong lines following Hayato et al. (2010). We assumed a hydrogen column density of (Cassam-Chenaï et al., 2007) and standard ISM abundances (Wilms et al., 2000). The results of the model fitting are shown in Table 2 and Fig. 8. In category 7, continuum emission is dominant; thus, the Fe K line cannot be detected.
In the top and middle panels of Fig. 8, the centroid energy of Fe K is higher, and the Ly/He, He/He line flux ratios of Si and S are higher, except for category 6. These trends correspond spatially to higher ionization and temperature in the outer side of the SNR. Especially in the NW region of the SNR, in which categories 1–5 form a layered structure, the physical parameters of ionization and the temperature appear to change the from inner to outer the SNR. We analyse the NW region in detail in Section 4.2. On the other hand, category 6, which is located on the rim in the eastern region of the SNR, has different characteristics. For category 6, the centroid energy of Fe K is highest among all the categories, but the Ly/He, He/He line flux ratios of Si and S are lower than the others. We analyse the Fe knot region, including category 6, in detail in Section 4.1.
In the bottom panel of Fig. 8, the line flux ratios of He of S, Ar, or Ca to Si increase gradually from the inner side toward the outer side of the SNR. On the other hand, the Fe K/Si He line flux ratios show a different trend. These ratios decrease toward the outer side of the main SNR shell and are lowest in the region of category 5, although the ratio for category 6, which is located the outer edge of the SNR, is only an order of magnitude higher than the others. Category 6 clearly has different spectral features from the other regions.
Although it is difficult to fully understand how each feature in the raw data (i.e., spectral structure in this case) affects the latent expressions, we think the low-energy sides of the spectra also contribute significantly to the clustering because the low-energy part is divided into finer bins than the high-energy part is. For example, whereas the Fe K blend consists of only 1 energy bin in our binning method, the energy band of the Fe L blend between 0.75 and 1.31 keV is divided into 10 bins.
The Fe xvii line () structure at 0.83 keV is divided into 4 bins and appears strongly in the spectra of categories 1–4. In addition, the Mg K line at 1.35 keV appears strongly only in categories 1 and 2, and the O K line structure at 0.65 keV appears strongly in the spectra of categories 0, 1, and 7. We think that these structures contribute to distinguishing these categories from others.
4 Detailed Analysis of the Regions Suggested by Machine Learning
As shown in the previous section, the unsupervised machine learning method can discover spatial structures. In this section, we choose two regions of the revealed structure and analyse them in detail.
4.1 Spectral Analysis of the Fe Knot
The Fe knot located along the eastern rim of Tycho’s SNR represents unusual morphological features in which several iron-rich clumps outrun the FS. The Fe knot can be divided into substructures, and Yamaguchi et al. (2017) analysed in detail these fine regions. The regions defined by Yamaguchi et al. (2017) have the following counterparts in our analysis: ‘A’ and ‘E’, category 6; ‘B’, category 4; ‘C’ and ‘D’, category 2; ‘X’ and ‘Y’, category 7.
We extracted spectra from the substructure representative of each category with responsibilities above 90%. The results of fitting by the model described in the previous section are summarised in Table 3.
Figure 10 shows the spectra of the regions of categories 4 and 6 in the Fe knot, which correspond to the regions marked ‘GMM4’ and ‘GMM6’ in Fig. 9, respectively, with responsibilities above 90%. The spectrum of category 4 has enhanced IME line radiation, although one of the category 6 features has weaker IME lines and stronger Fe K emission. The regions of the Fe knot corresponding to categories 4 and 6 are characterised by the lowest and highest ratios of the Fe K and Si He line fluxes, respectively. The Fe/Si flux ratio in category 6 is also quite high, as shown in the bottom panel of Fig. 8; thus, the region has a spectral feature unique to not only the entire SNR but also the Fe knot. The machine learning extracted a characteristic structure in which the Fe emission is much stronger.
The He line flux ratios of S, Ar, or Ca to Si in categories 1, 2, 4, and 6 are approximately constant in the Fe knot. This implies a constant electron temperature in the knot, in agreement with Yamaguchi et al. (2017).
The S He/Si He line flux ratios in categories 1 and 2 in the Fe knot are higher than those in the entire SNR (Table 2). This reflects a higher ionization state in the Fe knot, which is located on the SNR rim (i.e., at a large radius), than that typical of the shocked and unshocked ejecta in the inner part of the SNR, where categories 1 and 2 are most common.
The Fe K/Si He line flux ratios in categories 1 and 2 in the Fe knot are higher than those in the entire SNR (Table 2). However, this ratio for category 4 in the Fe knot is comparable to that in the entire SNR. As shown in the left image of Fig. 9, the clump emitting Fe coincides with the blob emitting IME in the category 4 region.
4.2 Spectral Analysis of NW Ejecta
In the NW region of Tycho’s SNR, categories 1–5 and 7 are layered. We divided the NW part of the SNR into annular regions along the layered features representing the categories, as shown in Fig. 11. The centre of the annuli was determined to be R.A. = 00h25m17754, +64°08′06549 so that those annuli align with the layered structure. The regions were labelled NW1 to NW10 from the inner to the outer side. We extracted the spectra from these regions, adopting a background spectrum extracted from a box region outside of the SNR. We performed model fitting in the 1.7–9 keV band using the same models as in Section 3.4 to investigate the line emission of IMEs and Fe.
The surface brightness peak of the Fe K line forms the innermost layer, as compared to the He lines of Si, S, Ar, and Ca, in the top panel of Fig. 12. This trend was previously seen in ASCA observations (Hwang & Gotthelf, 1997) and XMM-Newton observations (Decourchelle et al., 2001).
The upper middle panel of Fig. 12 shows the Fe K/Si He or Fe K/S He flux ratios, which peak between regions NW5 and NW6 (156.7–177.3 arcsec). They reflect the difference in the peak positions of the Fe K and He of Si or S line emission.
In the lower middle panel of Fig. 12, the line flux ratios of Ly/He and He/He of Si and S increase from the inner region NW5 to the outer one NW10 (156.7–227.3 arcsec). These trends correspond to a radial gradient of temperature or ionization, which is higher on the outer side of the SNR. A temperature gradient in the ejecta is suggested, as it is hottest near the RS and cooler near the CD, because the Fe K emission peak is located interior to those of Fe L and Si (Hwang & Gotthelf, 1997; Decourchelle et al., 2001). Furthermore, Yamaguchi et al. (2014) showed electron heating near the RS of Tycho’s SNR. If we attribute our results to a temperature gradient, our findings imply an opposite trend to that inferred in these works. On the other hand, a variation in the Fe ionization state near the RS (Yamaguchi et al., 2014) was reported. Moreover, a radial gradient of the ionization age was suggested by the Si He/S He flux ratio (Lu et al., 2015) and Fe K centroids (Sato & Hughes, 2017). Thus, our results are consistent with these works if we interpret the radial dependence of the line flux as arising from the ionization age gradient induced by RS propagation.
Then we fit the spectra with models in the 4.2–10 keV band to investigate the K and K lines of Fe and the K lines of secondary Fe-peak elements (Cr and Mn). The Gaussian widths of the Fe K, Cr K, and Mn K lines are linked to those of Fe K.
The centroid energies of the Fe K lines shown in the bottom panel of Fig. 12 are flat between NW1 and NW5 (90.0–167.0 arcsec) and begin to increase at NW5 around 162 arcsec. By contrast, the line width of Fe K (except in the outermost region, NW10) and the centroid energies of Fe K do not change significantly. The peak of the surface brightness of Fe K is in the NW6 region between 177.3 and 189.4 arcsec, and the peak of Fe K is in the inner region NW5 between 156.7 and 167.0 arcsec, which is consistent with the results of Yamaguchi et al. (2014).
Warren et al. (2005) estimated the averaged RS radius to be 183 arcsec. The RS radius is located in region NW7, which coincides with the turning point of the surface brightness of IME He and Fe K in the top panel of Fig. 12. By contrast, Yamaguchi et al. (2014) estimated the RS radius as 158 arcsec in the NW quadrant, which is interior to and more realistic than the former one. The RS radius is located near the boundary of regions NW4 and NW5. It coincides with the turning point of the centroid of Fe K in the bottom panel of Fig. 12. Thus, it seems that categories 3–5 are ejecta shocked by the RS, and most of categories 1 and 2 are located inside the RS in projection.
In Fig. 12, the transitions of the flux ratios of line emission or the line centroid energy described above appear at the RS position. Thus, the coincidences suggest that the features reflect the plasma state of the ejecta caused by RS heating.
We also note the detection of Fe K in regions NW3–NW8 excluding NW7, Cr K in regions NW6 and NW8, and Mn K in region NW8 with or more.
5 Discussion and Conclusions
We implemented an unsupervised machine learning method combining the VAE and GMM, where the dimensions of the observed data are reduced by the VAE, and clustering in feature space is done by the GMM, and applied the method to Tycho’s SNR, one of the best-known SNRs.
Our unsupervised machine learning method automatically revealed spatial structures which have been discussed in the literature (see, e.g., Yamaguchi et al., 2017). This demonstration shows that the method is a powerful tool for data analyses that makes it possible to exploit the rich information contained in data obtained by X-ray observations of SNRs. It may be possible to discover SNR physics by post-training analyses using the results of machine learning.
It is also worth noting that the method discovered the spatial structures automatically, although no spatial information was used in the model. This means that the method can extract physical feature based only on the spectral information.
As demonstrated in Sections 3 and 4, the VAE extracts features using the relative intensities of lines as well as the properties of the continuum spectrum. These characteristics of thermal X-ray spectra reflect the plasma conditions (e.g., temperature, ionization, elemental abundances, and electron or ion densities). When the data distribution in feature space is categorized by the GMM, the entire region is divided into a small number of clusters. As shown by our analysis, clustering can reveal both sharp, knot-like features and continuous changes in physical parameters. Sharp structures are classified as a single category. For example, the Fe-rich blob in the Fe knot on the eastern rim of the SNR, shown in Section 4.1, is assigned to category 6. By contrast, if physical parameters change gradually, clustering may result in a layered spatial structure like that seen in the NW regions of the SNR (see Section 4.2 for details).
The reason that each individual spectrum is classified in a certain category is not yet clear from the network outputs but needs to be investigated and interpreted by human experts, as we mentioned in Section 3.4. It would be useful if the network itself provided the reason, e.g., by highlighting the spectral features that cause the spectrum to be assigned to a particular category. Unveiling the reasoning process of the network is a significant problem and is beyond the scope of the current work (see, e.g., Smilkov et al., 2017, for recent reviews).
Developing such methods is important for making the best use of the currently available data and for addressing the growing quality and quantity of future observational data. Model fitting of spectra is generally time-intensive; thus, the difficulty of spectral analysis is expected to increase steeply with successful implementation of an X-ray microcalorimeter (such as Athena; Barret et al., 2018). The suggested unsupervised method can reveal characteristic features directly from raw observational data without spectral model fitting. It can be an efficient tool to define regions for spectral extraction.
Our method implemented in this work is not limited to SNRs and can be applied to other classes of sources such as galaxy clusters. The method is equally applicable to temporally and spatially variable data, because the training uses only spectral information. Furthermore, our method can also be applied to other energy bands; e.g., it is expected to have good applicability to radio observational data, which contain spatial, temporal, and velocity information (i.e., they have the same dimensions as X-ray data: spatial, temporal, and spectral information).
The deep learning architecture can be improved. In this method, it is a problem that the VAE used to reduce the dimensions of the data tends to form a single peak distribution around 0 in feature space; thus, the boundaries of the extracted data distribution are not clear. Using the architecture of the Wasserstein autoencoder (WAE; Tolstikhin et al., 2017) or Gaussian Mixture VAE (GMMVAE; Dilokthanakul et al., 2016) may improve the structure of the latent manifold. A model using convolutional layers, e.g., a convolutional VAE, can be applied to use the spatial information in a data set.
Acknowledgements
We thank Dr Hiroya Yamaguchi, Masato Taki, and Dmitry Khangulyan for useful information and discussion. This work was partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grants (Number 18H03722 and 18H05463). This work was supported by the Astro-AI working group in the RIKEN iTHEMS.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Abadi et al. (2015) Abadi M., et al., 2015, Tensor Flow: Large-Scale Machine Learning on Heterogeneous Systems, https://www.tensorflow.org/
- 2Barret et al. (2018) Barret D., et al., 2018, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series. p. 106991 G ( ar Xiv:1807.06092 ), doi:10.1117/12.2312409 · doi ↗
- 3Burkey et al. (2013) Burkey M. T., Reynolds S. P., Borkowski K. J., Blondin J. M., 2013, Ap J , 764, 63 · doi ↗
- 4Cassam-Chenaï et al. (2007) Cassam-Chenaï G., Hughes J. P., Ballet J., Decourchelle A., 2007, Ap J , 665, 315 · doi ↗
- 5Charnock & Moss (2017) Charnock T., Moss A., 2017, Ap J , 837, L 28 · doi ↗
- 6Chollet et al. (2015) Chollet F., et al., 2015, Keras, https://github.com/keras-team/keras
- 7Davoodi et al. (2006) Davoodi P., et al., 2006, AJ , 132, 1818 · doi ↗
- 8Decourchelle et al. (2001) Decourchelle A., et al., 2001, A&A , 365, L 218 · doi ↗
