Accelerating the search for Axion-Like Particles with machine learning

Francesca Day; Sven Krippendorf

arXiv:1907.07642·astro-ph.HE·March 24, 2020

Accelerating the search for Axion-Like Particles with machine learning

Francesca Day, Sven Krippendorf

PDF

TL;DR

This paper demonstrates how machine learning can be used to efficiently analyze X-ray spectra to place bounds on axion-like particles, showing comparable results to traditional methods with potential for future telescope data.

Contribution

The study introduces ML algorithms for analyzing astrophysical spectra to constrain ALP-photon couplings, offering a novel approach that improves bounds on an individual source basis.

Findings

01

ML methods achieve bounds similar to traditional techniques.

02

ML provides improved bounds for specific sources.

03

Potential for enhanced ALP searches with future high-resolution telescopes.

Abstract

Machine learning (ML) techniques have been applied with tremendous success in many areas of physics. In this work, we use ML to place bounds on the coupling between photons and axion-like particles (ALPs). This coupling causes ALPs and photons to interconvert in the presence of a background magnetic field. This would lead to modulations in the spectra of point sources shining through the magnetic fields of galaxy clusters. This effect has already been used to place world-leading bounds on the ALP-photon coupling using conventional statistical methods. We train ML classification algorithms on simulated spectra from the Chandra X-ray telescope for a range of point sources and ALP-photon couplings. We then use the response of these algorithms to the real Chandra spectra to place bounds on ALP-photon interactions. We obtain bounds at a similar level to those based on other techniques, but…

Tables1

Table 1. Table 1 : Bounds on g 𝑔 g in units of 10 − 12 GeV − 1 superscript 10 12 superscript GeV 1 10^{-12}\,{\rm GeV}^{-1} obtained using machine learning classification algorithms.

	ABC	DTC	GaussianNB	QDA	RFC
A1367 residuals	1.9	none	none	none	none
A1367 upscaled residuals	2.0	none	1.9	none	none
A1795 Quasar residuals	none	none	1.7	none	1.4
A1795 Quasar upscaled residuals	none	none	none	none	none
A1795 Sy1 residuals	1.0	0.8	1.2	1.1	0.7
A1795 Sy1 residuals upscaled	1.1	1.1	1.1	1.0	0.8

Equations4

L = \frac{1}{2} \partial_{μ} a \partial^{μ} a - \frac{1}{2} m_{a}^{2} a^{2} + g_{aγ γ} a E \cdot B,

L = \frac{1}{2} \partial_{μ} a \partial^{μ} a - \frac{1}{2} m_{a}^{2} a^{2} + g_{aγ γ} a E \cdot B,

\left(\omega+\left(\begin{array}[]{ccc}\Delta_{\gamma}&0&\Delta_{\gamma ax}\\ 0&\Delta_{\gamma}&\Delta_{\gamma ay}\\ \Delta_{\gamma ax}&\Delta_{\gamma ay}&\Delta_{a}\end{array}\right)-i\partial_{z}\right)\left(\begin{array}[]{c}\mid\gamma_{x}\rangle\\ \mid\gamma_{y}\rangle\\ \mid a\rangle\end{array}\right)=0~{},

\left(\omega+\left(\begin{array}[]{ccc}\Delta_{\gamma}&0&\Delta_{\gamma ax}\\ 0&\Delta_{\gamma}&\Delta_{\gamma ay}\\ \Delta_{\gamma ax}&\Delta_{\gamma ay}&\Delta_{a}\end{array}\right)-i\partial_{z}\right)\left(\begin{array}[]{c}\mid\gamma_{x}\rangle\\ \mid\gamma_{y}\rangle\\ \mid a\rangle\end{array}\right)=0~{},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Accelerating the search for Axion-Like Particles with machine learning

Francesca Day

Sven Krippendorf

Abstract

Machine learning (ML) techniques have been applied with tremendous success in many areas of physics. In this work, we use ML to place bounds on the coupling between photons and axion-like particles (ALPs). This coupling causes ALPs and photons to interconvert in the presence of a background magnetic field. This would lead to modulations in the spectra of point sources shining through the magnetic fields of galaxy clusters. This effect has already been used to place world-leading bounds on the ALP-photon coupling using conventional statistical methods. We train ML classification algorithms on simulated spectra from the Chandra X-ray telescope for a range of point sources and ALP-photon couplings. We then use the response of these algorithms to the real Chandra spectra to place bounds on ALP-photon interactions. We obtain bounds at a similar level to those based on other techniques, but find improvements on an individual source basis. We expect such search techniques to become increasingly important for ALP searches with future telescopes that will offer substantially higher energy resolution.

LMU-ASC 29/19

1 Introduction

Axion-like particles (ALPs) are very well motivated extensions of Beyond-The-Standard-Model physics. Utilising the fact that ALPs and photons interconvert in background magnetic fields [1], there has been a long-standing experimental quest for such particles [2]. The ALP-photon Lagrangian is:

[TABLE]

where $a$ is the ALP field, $m_{a}$ is the ALP mass, $g_{a\gamma\gamma}$ is the coupling between ALPs and photons, ${\bf E}$ is the electric field and ${\bf B}$ is the magnetic field. Linearising the resulting Euler-Lagrange equations for an ALP/photon of frequency $\omega$ , we obtain the equation of motion for ALP-photon interconversion in a background magnetic field:

[TABLE]

where $\Delta_{\gamma}=\frac{-\omega_{pl}^{2}}{2\omega}$ , $\Delta_{a}=\frac{-m_{a}^{2}}{\omega}$ and $\Delta_{\gamma ai}=g_{a\gamma\gamma}\frac{B_{i}}{2}$ . The effective photon mass is given by the plasma frequency $\omega_{pl}=\left(4\pi\alpha\frac{n_{e}}{m_{e}}\right)^{\frac{1}{2}}$ , where $\alpha$ is the fine structure constant, $n_{e}$ is the electron density and $m_{e}$ is the electron mass. In this work, we will neglect the ALP mass, setting $m_{a}=0$ . This approximation is valid for ALP masses below the effective photon mass in astrophysical plasmas, $m_{a}\lesssim 10^{-12}\,{\rm eV}$ . Note that in this work we consider generic ALPs, rather than the QCD axion. $m_{a}$ and $g_{a\gamma\gamma}$ are therefore independent parameters. Equation 1.2 may be solved analytically in certain regimes, but in general requires numerical solution. In either case, we find that the conversion probability is pseudo-sinusoidal in $\frac{1}{\omega}$ .

Here we focus on ALP-photon interconversion in the magnetic fields of galaxy clusters. The presence of ultra-light ALPs $(m_{a}\leq 10^{-12}{\rm eV})$ ) leads to spectral distortions of point sources shining through galaxy clusters at X-ray energies [3]. The search for modulations in the spectra of X-ray sources located in or behind galaxy clusters has lead to world leading bounds on $g_{a\gamma\gamma}$ [4, 5, 6, 7]. Future X-ray telescopes such as Athena and IXPE will lead to an improvement in these bounds [8, 9].

The search for spectral modulations (reviewed in Section 2) has so far used relatively simple statistical methods and it seems very plausible that search strategies adapted for ALP-like signals might provide higher sensitivity. Machine learning has been used with great success in many areas of physics. In particular, it is known that machine learning approaches are well suited to classification problems. Classifiers are able to sort input data based on potentially subtle or hard to define features, which may be obscured by noise. Famously, classifiers may be trained to recognise faces, based on many training examples, but without needing to be told the features of a face. In physics, machine learning has been successfully used in classification of galaxy morphology [10] and jets in particle colliders [11], to name but two. Machine learning techniques have also been proposed for anomaly detection in X-ray spectra [12].

We will focus in this work on a supervised learning approach, in which we train our classifiers with labelled sample data – in this case spectra simulated with and without the effects of ALPs. We note in passing that unsupervised learning, in which the classifiers are not given training data, may also have potential for physics discovery [13, 14, 15].

2 The Problem

We search for ALP induced modulations in the spectra of point sources in or behind galaxy clusters, as observed by the Chandra X-ray telescope [16]. We process each observation using CIAO 4.8.1 [17], stacking observations from the same source and subtracting the cluster background. We consider the energy range $1-5$ keV111The only exception being NGC1275 where we consider $0.8-5$ keV., where Chandra has a high effective area. Our spectra are significantly impacted by Chandra’s energy resolution of $150$ eV (FWHM). In effect, we observe the true spectrum convolved with a Gaussian of FWHM $150$ eV. This will partially blurr any ALP induced features. Furthermore, our spectra will suffer Poisson noise, with amplitude determined by the observation time. ALP-induced oscillations could potentially hide within this Poisson noise. We model both these effects directly by simulating fake data using the $\it Sherpa$ software [18], as described below.

We seek to distinguish between two models for our observed flux - $F(E)=A(E)$ and $F(E)=A(E)P_{\gamma\to\gamma}(E,g_{a\gamma\gamma},{\bf B})$ . $A(E)$ is the point source’s spectrum assuming standard astrophysics with no ALPs, described in more detail below. $P_{\gamma\to\gamma}(E,g_{a\gamma\gamma},{\bf B})$ is the photon survival probability induced by the presence of ALPs with a coupling $g_{a\gamma\gamma}$ to the photon and the magnetic field $\bf B$ along the line of sight to the source. An example of such a photon survival probability is shown in Figure 1. Note that oscillations in a spectrum could also result from a different mechanism, such as mis-modelling atomic lines, or instrumental effects such as pileup.

Figure 2 shows the observed spectrum of the Seyfert galaxy 2E3140 in the galaxy cluster A1795, and its simulated spectrum assuming the existence of ALPs with $g_{a\gamma\gamma}=5\times 10^{-12}\,{\rm GeV}^{-1}$ and a particular realisation of the A1795 magnetic field. We see that ALPs induce characteristic oscillations in the residuals, with larger wavelengths than those from Poisson fluctuations alone. The magnitude and power spectrum of B for a particular galaxy cluster is inferred from observations of Faraday rotation measures and synchrotron emission. However, the specific configuration of B along the line of sight to the point source is unknown, and represents a large set of nuisance parameters in our attempts to constrain $g_{a\gamma\gamma}$ . It is important to realise that the form of $P_{\gamma\to\gamma}(E,g_{a\gamma\gamma},{\bf B})$ depends heavily on the precise form of ${\bf B}$ . For example, for a different magnetic field configuration, the peaks and troughs of $P_{\gamma\to\gamma}(E,g_{a\gamma\gamma},{\bf B})$ would occur at different energies. However, some features of the spectrum, such as the increasing wavelength of the oscillations with increasing energy, are generic. This fact makes it in principle possible to distinguish characteristically ALP-like features. This has been explored by considering the spectra in Fourier space in [19]. This work also considers using ML to place bounds on ALPs. Here we extend this effort, and find that ML can accelerate the search for ALPs.

Previous work has also searched for ALP induced oscillations in point source spectra, placing bounds on ALPs relying solely on the fact that these oscillations would make the spectra a bad fit to the astrophysics only model. Such search strategies have already placed leading bounds on low mass ALPs. A recent analysis of Chandra High-Energy Transmission Grating observations, which offer a higher energy resolution, achieves $g_{a\gamma\gamma}\lesssim 6-8\times 10^{-13}\,{\rm GeV}$ [7]. Several studies of data taken without the grating yield $g_{a\gamma\gamma}\lesssim 1.5\times 10^{-12}\,{\rm GeV}$ [4, 5, 6, 20].

However, searches based on a $\chi^{2}$ test or similar do not take into account the distinctive characteristics of ALP induced oscillations. We can therefore hope that our ALP searches will be improved by using machine learning to seek out ALP-like features in the spectra of point sources shining through galaxy clusters.

3 Astrophysical Systems

In this work, we will use Chandra observations of a number of point sources located in or behind galaxy clusters as a test bed for the potential of machine learning in searching for ALPs. Our observations were all taken without the High Energy Transmission Grating. We use the point sources considered in [4, 6]. These are:

•

The AGN NGC1275 at the centre of Perseus.

•

The quasars B1256+281 and SDSS J130001.48+275120.6 shining through Coma.

•

The AGN NGC3862 in A1367.

•

The AGN IC4374 at the centre of A3581.

•

The bright Sy1 galaxy 2E3140 within A1795.

•

The quasar CXOU J134905.8+263752 behind A1795.

•

The central AGN UGC9799 of the cluster A2052.

These sources were chosen based on their brightness and observation time with Chandra. To simulate the ALP-photon interconversion probability for these sources, we require estimates for the magnetic fields in their host clusters. We use the electron density and magnetic field estimates from [4, 6]. These are taken from published estimates derived from thermal emission and Faraday rotation measures respectively [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34] and from extrapolation from similar clusters when no such estimates are available. We only find competitive bounds from the AGN in A1367, the quasar behind A1795 and the Seyfert 1 galaxy in A1795. These are the same sources for which competitive bounds are obtained using conventional statistical methods in [6]. The constraining ability of different sources is driven primarily by the number of photon counts available for each source. In the case of NGC1275, a large number of photon counts are available, but the spectrum displays highly significantly anomalies [4] which limits its constraining power. These are probably a result of instrumental effects. For the rest of the main body of the paper, we will restrict our discussion to the three sources for which competitive bounds are found. We will return to discuss the potential of NGC1275 in appendix A.

4 Datasets

To train our classifiers, we require many simulated data sets, based on the measured spectrum of each point source, simulated both with and without the effects of ALPs. In the former case, the simulated data sets will differ in the realisation of the magnetic field along the line of sight to the source, and in the realisation of the Poisson errors in each bin. In the latter case, only the Poisson errors will differ between data sets. To this end, we generated $1000$ magnetic field realisations for each ALP coupling from $0.1-2.0\times 10^{-12}$ GeV in steps of $0.1\times 10^{-12}$ GeV. Crucially, we generate a different set of magnetic fields for each $g_{a\gamma\gamma}$ . Each magnetic field realisation is composed of $\mathcal{O}(100)$ cells (chosen to match the physical size of the cluster), with cell sizes drawn from a power law distribution. Within each cell, the magnetic field is constant with a randomly chosen direction, and an amplitude set by its distance from the cluster centre. The distribution of cell sizes and the radial fall off is different for each cluster, as described in [6]. For each such magnetic field, we simulate the photon survival probability as a function of energy assuming the presence of ALPs with coupling $g_{a\gamma\gamma}$ to the photon, by numerical solution of equation 1.2.

Using Sherpa, we fit the data in the range $1-5$ keV from each point source to an power law model with absorption from neutral hydrogen, also allowing a soft thermal component where this improves the fit. This gives us the astrophysics only model $A(E)$ for that source. From these best fit models, we generate fake data sets using the Sherpa fake data function, in the cases of no ALPs (source spectrum $A(E)$ ) and for each of the photon survival probabilities simulated (source spectrum $A(E)P_{\gamma\to\gamma}(E,g_{a\gamma\gamma},{\bf B})$ ). For each case, we generate $10^{4}$ different fake data sets, which differ from each other in the realisation of Poisson errors in each bin (and underlying magnetic fields). The level of this noise is set by the simulated exposure time, which we set to be the same as that of the actual observation. Generating our fake data in this way takes into account the instrumental response of the telescope, in particular including its energy resolution.

We now seek to compare the spectra simulated with and without the presence of ALPs. In particular, we seek to build a classifier that can distinguish between the ALP and no ALP cases. There are three main differences between the cases with and without ALPs:

•

The spectra with ALPs have overall lower flux.

•

The spectra with ALPs have a higher decrease in flux as we increase energy.

•

The spectra with ALPs display oscillations about the power law model, greater than would be expected from Poisson fluctuations alone, and with increasing wavelength for higher energies.

We may only use the last of these differences in trying to distinguish ALP and non-ALP spectra, as we do not know the intrinsic amplitude or power law index of the source. We therefore cannot train classifiers with our raw simulated spectra and then hope to use them meaningfully on real data. We consider three approaches to this problem. Firstly, we refit each spectrum (both ALPs and no ALPs) to an absorbed power law model, and train our classifiers on the residuals of this fit. Secondly, we rescale our simulated data to remove the first two effects. We do this using the following ‘upscaling’ procedure:

For each coupling $g_{a\gamma\gamma}$ , we calculate the average photon survival probability per bin, averaging over each magnetic field realisation. This gives us a function $P^{\rm av}_{\gamma\to\gamma}(E,g_{a\gamma\gamma})$ , evaluated at each energy bin. 2. 2.

We find the inverse of the average photon survival probability $P^{\rm av^{-1}}_{\gamma\to\gamma}(E,g_{a\gamma\gamma})$ . In practice, this is found by simply taking the inverse of the value of $P^{\rm av}_{\gamma\to\gamma}(E,g_{a\gamma\gamma})$ in each energy bin. 3. 3.

We generate fake data with ALPs using the source model

$A(E)P_{\gamma\to\gamma}(E,g_{a\gamma\gamma},{\bf B})P^{\rm av^{-1}}_{\gamma\to\gamma}(E,g_{a\gamma\gamma})$ , rather than simply $A(E)P_{\gamma\to\gamma}(E,g_{a\gamma\gamma},{\bf B})$ .

In this way, the average, large scale effects of ALP-photon conversion (the overall decrease, and the increased suppression at high energies) are removed, but the local features that cannot be modeled by standard astrophysics are retained. Thirdly, we use both techniques simultaneously, first performing the upscaling procedure described above and then refitting to an absorbed power law and training our classifiers with the residuals.

5 Classifiers

We report results on the following ML classifiers [35, 36] on labelled spectra simulated with and without the presence of ALPs, as described above:

•

Gaussian Naive Bayes (GaussianNB)

•

Quadratic Discriminant Analysis (QDA)

•

Random Forest Classifier (RFC)

•

Ada Boost Classifier (ABC)

•

Gaussian Process Classifier (GPC)

•

Decision Tree Classifier (DTC)

•

K Neighbours Classifier (KNC)

•

Support Vector Machine (SVM)

Each classifier $\mathcal{C}_{g}$ is trained to distinguish between spectra with no ALPs and those with ALPs with coupling $g$ . 222One could also build multi-category classifiers trained with a range of couplings. In this proof-of-prinicple paper, we restrict ourselves to the simpler two-category classifiers. These are generally more straight forward to train and assess. We note also that the methods presented here are only appropriate to setting bounds on the ALP parameters, and not for making a discovery. One could certainly use these classifiers as a discovery tool, but the look elsewhere effect would need to be carefully taken into account. We split our dataset of simulated data samples into training and test sets. We have performed numerical experiments using varying sizes of training sets $N=(3600,4500,6400,8000)$ for the classifiers. We check that the results do not vary significantly with $N$ . For the GPC, KNC and SVM, we did not obtain competitive bounds. We present our results for the other five classifiers below.

We can understand the inefficacy of the GPC, KNC and SVM classifiers as follows. The Gaussian Process Classifier assumes that the classifier function can be modelled as a Gaussian Process - as assumption that is not well justified this case. The perforamance of the Support Vector Machine is highly dependent on the choice of kernel. As our other classifiers give good performance already, we have not optimized the SVM kernel. The K-Neighbours Classifier is a relatively simple classifier that does not generally track higher order features of the data. Such higher order features are essential for characterising ALP-photon oscillations, so it is unsurprising that no bounds are obtained. In general, the performance of a classifier is dependent on the choice of hyper-parameters. Such a choice of hyper-parameters corresponds to a different choice of bias on the model space. This biasing provides a reason for the observed difference in performance in the classifiers. To see the general viability of using classifiers, we would like to stress that we observe reasonable performance across various classifiers and, as previously explained, the expected failure of several classifiers.

6 Classifier Performance

We have a set of classifiers $\mathcal{C}_{g}$ trained to classify spectra as containing ALP-induced oscillations or no ALP-induced oscillations. We have trained these classifiers either on residuals when data is fit with a power law, or by using the ‘upscaled’ ALP data as described above, or using the residuals from upscaled data. For classifer $\mathcal{C}_{g}$ , the training data with ALPs was generated assuming an ALP-photon coupling $g$ , and using a range of randomly generated magnetic fields. We can test the performance of these classifers using separate sets of test data, generated without ALPs or with ALPs at a range of $g$ values. For example, all classifiers should show the same behaviour for data simulated with very low values of $g$ as for data simulated without ALPs.

Figures 3 and 4 show the performance of our classifiers $\mathcal{C}_{g}$ when queried with data simulated with different ALP-photon couplings $g_{\rm query}$ for the Sy1 galaxy 2E3140 within A1795. We see that for classifiers trained with sufficiently high $g$ values, data simulated with ALPs is mostly classified as such, while data simulated without ALPs is also mostly classified correctly. For all three data processing methods, we see a clear separation between ALP and no ALP data. The performance of the other classification algorithms follows a similar pattern. Interestingly, we also find that when classifiers trained with a relatively low value of $g$ ( $\mathcal{O}(10^{-13})\,{\rm GeV}^{-1}$ ) are queried with data simulated with a high value of $g_{\rm query}$ ( $\mathcal{O}(10^{-12})\,{\rm GeV}^{-1}$ ), the result is usually ‘No ALPS’. This suggests that there are significant qualitative differences between the high and low $g$ regimes that the classifiers are picking up on. This behaviour does not affect our test statistic, defined below, and therefore does not affect our bounds.

Figures 4 and 5 (right) show the performance of our RFC classifiers for very low values of $g_{\rm query}$ . We expect this to be the same as their performance for $g_{\rm query}=0.$ We see that this is true for the classifiers trained with residuals and upscaled residuals, but not for the classifiers trained with upscaled data. This is also the case for the other classification algorithms. The cause of this bias in the upscaled classifiers is not known. We therefore do not use the upscaled classifiers for setting bounds.333The bounds obtained from the upscaled classifiers are significantly better than those from the residual classifiers.

We can also use our classifiers on real data, and hence obtain a bound on $g_{a\gamma\gamma}$ . We input our real data to each of our classifiers $\mathcal{C}_{g}$ . The output of each $\mathcal{C}_{g}$ will be either ALPs or No ALPs. For very high values of $g$ , assuming the data does not contain such ALPs, we expect the classifier to return No ALPs a very high proportion of the time. At intermediate values of $g$ , on the boundary of what would be detectable, we might expect the classifier to return No ALPs the majority of the time. Furthermore, if ALPs actually are present in the data with coupling $g=g_{a\gamma\gamma}$ , we would expect classifiers trained with couplings close to $g_{a\gamma\gamma}$ to return ALPs most of the time. This is the effect we want to use to place bounds on $g_{a\gamma\gamma}$ .

7 Bounds

Having established that our classifiers can distinguish between observations simulated with and without ALPs, we now seek to use them to place bounds on the ALP-photon coupling.

We define a test statistic for a data set $\mathcal{D}$ :

${\rm TS}_{\mathcal{D}}=$ highest value of $g$ such that $\mathcal{C}_{g}$ classifies $\mathcal{D}$ as ALPs

For example, more noisy data, in which it is easier for ALPs to hide, and hence harder to distinguish the ALP and no ALP cases, will have a higher ${\rm TS}_{\mathcal{D}}$ . To place bounds on ALPs we consider the null hypothesis:

$H_{0}$ : ALPs exist with $g=g_{\rm null}$ .

We now find the null distribution of ${\rm TS}_{\mathcal{D}}$ by Monte Carlo. We generate $2000$ fake data sets (i.e. spectra) $\{\mathcal{D}^{i}(g_{\rm null})\}$ assuming ALPs with $g=g_{\rm null}$ with different magnetic field realisations. We find ${\rm TS}_{\mathcal{D}^{i}(g_{\rm null})}$ for each fake data set in $\{\mathcal{D}^{i}(g_{\rm null})\}$ . If 95 % of the ${\rm TS}_{\mathcal{D}^{i}(g_{\rm null})}$ are higher than the test statistic for the real data, $g_{\rm null}$ is excluded at the 95 % confidence level.

For each data set, we check if the null distribution of ${\rm TS}_{\mathcal{D}}$ has an approximately Gaussian form. There are two circumstances in which this might not happen:

•

If the training data is so noisy that no value of $g$ (or no tested value of $g$ ) has a significant effect on the data, then ${\rm TS}_{\mathcal{D}}$ will just take a random value for each fake data set, whatever the value of $g_{\rm null}$ . The resulting distribution will clearly not be Gaussian. The training data has the same noise level as the real data we intend to classify. Physically, in this situation the data is too noisy to place any bounds on $g$ .

•

We have trained classifiers with values of $g$ so high that the conversion probability has become saturated. When $g_{\rm null}$ is also high enough that the conversion probability is saturated, ${\rm TS}_{\mathcal{D}}$ will just be the highest value of $g$ we happened to use for training. Physically, this is because it is not possible to distinguish between different values of $g$ that both saturate the conversion probability. In this case, we can still place upper bounds on $g$ using the lower tail of the null distribution.

Figure 6 shows a null distribution plot for the bright Sy1 galaxy 2E3140 within A1795 for no ALPs, low couplings (indistinguishable from no ALPs), and larger couplings.

In detail, our bounds procedure for residual classifiers is as follows. The bounds procedure for classifiers using the upscaled residuals is analogous.

Choose a set of $g$ values with which to build classifiers. For example, $g_{C}=\{1-20\}\times 10^{-13}\,{\rm GeV}^{-1}$ . 2. 2.

Simulate photon survival probabilities $P^{\rm train}_{j}(E,g_{C})$ for each value of $g_{C}$ considered and for 800 different magnetic field configurations $\{B^{\rm train}_{j}\}$ . 3. 3.

Fit the real data with an absorbed power law model, giving a best fit power law $F_{\rm fit}(E)$ . 4. 4.

For each simulated photon survival probability $P^{\rm train}_{j}(E,g_{C})$ , simulate $10$ fake data sets using the exposure and background from the real data and a source spectrum $P^{\rm train}_{j}(E,g_{C})\times F_{\rm fit}(E)$ . We therefore have $8000$ fake data sets for each $g_{C}$ . 5. 5.

Fit each such fake data set with an absorbed power law, allowing the parameters to vary freely again. Save the residuals from each fit $R^{\rm train}_{j}(E,g_{C})$ . 6. 6.

Simulate $8000$ fake data sets with no ALPs, i.e. with source spectrum $F_{\rm fit}(E)$ . Fit each of these fake data sets with absorbed power law, again allowing the parameters to vary freely. Save the residuals to from each fit $R^{\rm train}_{j}(E,0)$ . 7. 7.

For each $g_{C}$ , train a classifier $\mathcal{C}_{g_{C}}$ to distinguish $R^{\rm train}_{j}(E,g_{C})$ from $R^{\rm train}_{j}(E,0)$ – i.e. to distinguish residuals from data including ALPs with coupling $g_{C}$ from residuals from data containing no ALPs. 8. 8.

Now choose a value of $g$ , $g_{\rm null}$ , to attempt to exclude. It is not necessary for $g_{\rm null}$ to be equal to any of the $g_{C}$ . 9. 9.

Simulate photon survival probabilities $P_{j}(E,g_{\rm null})$ for $200$ different magnetic field configurations $\{B_{j}\}$ . These must be different from the magnetic field configurations used for the training data. 10. 10.

For each simulated photon survival probability $P_{j}(E,g_{\rm null})$ , simulate $10$ fake data sets using the exposure and background from the real data and a source spectrum $P_{j}(E,g_{\rm null})\times F_{\rm fit}(E)$ . We therefore have $2000$ fake data sets with $g=g_{\rm null}$ . 11. 11.

Fit each such fake data set with an absorbed power law, allowing the parameters to vary freely again. Save the residuals from each fit $R_{j}(E,g_{\rm null})$ . 12. 12.

Feed each $R_{j}(E,g_{\rm null})$ to each of the classifiers $C(g_{C})$ . For each $R_{j}(E,g_{\rm null})$ , record the highest value of $g_{C}$ for which the corresponding classifier $\mathcal{C}_{g_{C}}$ returned a verdict of ALPs. We call this value $TS_{j}(g_{\rm null})$ . 13. 13.

The bar chart of the $TS_{j}(g_{\rm null})$ forms the null distribution of the test statistic defined above under the null hypothesis ‘ALPs with coupling $g_{\rm null}$ exist’ (see Figure 6). 14. 14.

Find the residuals $R_{\rm real}(E)$ when the real data is fit with a power law. 15. 15.

Feed $R_{\rm real}(E)$ to each of the classifiers $\mathcal{C}_{g_{C}}$ . Record the highest value of $g_{C}$ for which the corresponding classifier $\mathcal{C}_{g_{C}}$ returned a verdict of ALPs. We call this value $TS_{\rm real}$ . This is the test statistic for the real data. 16. 16.

If $TS_{\rm real}$ lies in one of the tails of the null distribution, we can exclude the null hypothesis with some degree of certainty. Depending on the tail, this could either be because the real data is much more axiony or much less axiony than the fake data with $g=g_{\rm null}$ . If 95 % of the $TS_{j}(g_{\rm null})$ are higher than $TS_{\rm real}$ , $g\geq g_{\rm null}$ is excluded at the 95 % confidence level.

8 Results and Discussion

Figure 7 shows the 5th and 95th percentile values of the test statistic as defined above for the Seyfert 1 galaxy in A1795. We query the classifiers with simulated data with $g_{a\gamma\gamma}$ as shown on the $x$ axis. The $y$ axis shows the $5\%$ (red) and $95\%$ (blue) percentile values of the test statistic. The blue line is the test statistic of the real data. The $95\%$ confidence limit on $g_{a\gamma\gamma}$ therefore corresponds to the $x$ axis value where the red points cross the blue line. We see that the $5\%$ percentile test statistic plateaus to a low (in this case zero) value of the test statistic at $g_{a\gamma\gamma}\sim 4.0\times 10^{-13}\,{\rm GeV}^{-1}$ (RFC, right panel). This corresponds to the maximum constraining power of this source and observation time, in the case that the real spectrum perfectly fits a no ALP model. This is because the classifiers cannot distinguish $g_{a\gamma\gamma}\sim 4.0\times 10^{-13}\,{\rm GeV}^{-1}$ from $g_{a\gamma\gamma}<4.0\times 10^{-13}\,{\rm GeV}^{-1}$ at the $95\%$ confidence level.

In this example, the test statistic of the real data is above the $5\%$ percentile plateau - this is the case for the majority of our sources and classifiers. This shows that the real data appears somewhat ‘more axiony’ than data simulated with very weakly interacting ALPs. This could be due to un-modelled astrophysical or instrumental effects, or simply a result of statistical fluctuations. Therefore, we do not saturate the maximum constraining power of this source with this classifier. In a couple of cases, for the DTC and RFC classifiers with the Sy1 source in A1795, the real data test statistic is lower than the $5\%$ percentile plateau. This shows that the real data appears somewhat less ‘axiony’ than data simulated with very weakly interacting ALPs. This is similar to a situation in which the reduced $\chi^{2}$ value for a data set is less than one. The data is too good a fit to the standard model, for example due to lower than average Poisson fluctuations. In this case, we cannot use our bounds method and so do not report constraints on $g_{a\gamma\gamma}$ for these cases.

Table 1 shows the bounds on $g$ obtained using the method described above. The point source in A1367 and the quasar in A1795 do not consistently give bounds across classifiers. Where these sources fail to give bounds, it is because the test statistic for the real data is rather high – i.e. the real data ‘looks axiony’ to the classifier. Given the lack of consistency across classifiers, we do not consider that a reliable bound on $g_{a\gamma\gamma}$ is produced from these sources. On the other hand, applying machine learning classifiers to the Seyfert 1 galaxy within A1795 consistently gives bounds in the range $g_{a\gamma\gamma}\lesssim 0.7-1.2\times 10^{-12}\,{\rm GeV}^{-1}.$ This is in a similar range to the current leading bound [7] obtained using a significantly higher resolution data set with conventional statistical methods. Furthermore, it improves the previously reported bound in [6] of $g_{a\gamma\gamma}\lesssim 1.5\times 10^{-12}\,{\rm GeV}^{-1}$ for this source.

In this analysis, we have used one-dimensional, domain based magnetic field simulations to obtain the photon survival probabilities. This is standard practice for simulating point sources shining through galaxy clusters in the presence of ALPs [3, 4, 5, 6, 37, 7]. However, the true magnetic field structure of these clusters is better approximated by a three-dimensional turbulent field simulation, as used in [38]. We therefore have also tested the response of our classifiers to spectra simulated with a full turbulent field model. 444We thank our anonymous referee for pointing out this issue. As described in Appendix B, when using residuals, our classifiers are unable to reliably identify ALP oscillations from the turbulent field simulations. However, when using the upscaled residuals, our classifiers perform well on spectra simulated with the turbulent field model. This discrepancy clearly merits further investigation and will be the subject of future work. For now, we note that the bounds derived here from the fit residuals are very sensitive to the magnetic field structure. These should therefore be taken as a demonstration of the potential of machine learning, rather than as true bounds on $g$ . Furthermore, our results suggest that there are important qualitative differences between spectra generated with one-dimensional and three-dimensional magnetic field models. These differences may also prove important for ALP searches in point source spectra using more traditional statistical methods.

In this work we have demonstrated for the first time the use of machine learning techniques for ALP induced oscillations in point source spectra with application to real data. The bounds we obtain are competitive with those obtained using conventional statistical methods. They also represent an improvement on a point source basis, comparing the performance on the same datasets. Machine learning techniques have the potential to increase the reach of ALP searches in both current and future data sets. Future X-ray missions will feature substantially improved energy resolution and effective area, allowing us to probe even smaller values of $g_{a\gamma\gamma}$ . The improved energy resolution will reveal the characteristic features of spectral anomalies from ALP-photon interconversion in much greater detail. We therefore anticipate that the gains from machine learning techniques will be larger for future telescopes.

Acknowledgments

We would like to thank Joe Conlon, Andy Powell, Ben Hoyle and Edward Hughes for valuable discussions. Part of this research was supported by the Cambridge-LMU partnership programme. This work has been partially supported by STFC consolidated grant ST/P000681/1. FD is supported by a research fellowship from Peterhouse, University of Cambridge.

Appendix A NGC1275

In this appendix we present a short overview of the performance we find for the central AGN of NGC1275 in the Perseus cluster. As the available data features significantly more counts, we can train our classifiers with less noisy data samples. In turn this leads to a very good performance of the classifiers. Figures 8, 9. and 10 show examples of the performance we observe. However, we are unable to place bounds using this method as the real data is consistently classified as ‘axiony’ due to anomalies in the observed spectra. The performance of our classifiers suggests that the constraining or discovering power of NGC1275 is very high, potentially reaching $g_{a\gamma\gamma}\sim 4\times 10^{-13}\,{\rm GeV}^{-1}$ even for Chandra data taken without the High Energy Transmission Grating.

Appendix B 3D Magnetic field simulations

For the analysis presented in the main body of the paper, we used magnetic fields simulated in one dimension only, along the line of sight to the source. This simplification is required so that simulating the quantity of training and test data required is computationally tractable. It is natural to ask whether a different choice of magnetic fields leads to significant changes on our classification.555We would like to thank the referee for highlighting this point. We therefore also test the performance of our classifiers on spectra simulated with ALP-photon interconversion in the presence of full three-dimensional turbulent field simulations, as described in [38, 39]. Unlike the one-dimensional simulations used for the results presented above, this more sophisticated simulation does not have a discrete domain structure, and the magnetic field direction changes smoothly everywhere.

We take as an example the quasar B1256+281 located behind the Coma galaxy cluster. We use the same parameters for Coma’s field as in our one-dimensional simulations, but simulate the turbulent field as follows:

Generate a random vector potential field in Fourier space according to $\langle|\tilde{\bf A}(k)|^{2}\rangle\sim|k|^{-n}$ . We allow $k$ to take values from $k_{\rm min}=\frac{2\pi}{\Lambda_{\rm min}}$ to $k_{\rm max}=\frac{2\pi}{\Lambda_{\rm max}}$ . The phase of each component of $\tilde{\bf A}(k)$ is uniformly distributed between [math] and $2\pi$ . We use a Kolmogorov power spectrum corresponding to $n=17/3$ , $\Lambda_{\rm min}=2$ kpc and $\Lambda_{\rm min}=34$ kpc [23]. 2. 2.

The Fourier space magnetic field is given by $\tilde{\bf B}(k)=i{\bf k}\times\tilde{\bf A}(k)$ . 3. 3.

We find the real space field ${\bf B}({\bf x})$ by taking the Fourier transform of $\tilde{\bf B}(k)$ and normalising the result to the correct amplitude based on distance from the cluster centre, as described in [6]. We simulate ${\bf B}({\bf x})$ on a $2000^{3}$ grid with cell size $0.5$ kpc. 4. 4.

We then calculate the photon survival probability for five different sight-lines at a projected 240 kpc from the cluster centre, corresponding to the quasar’s location.

Associated to this magnetic field model we have generated five photon survival probabilities each for five different couplings in the range $g_{a\gamma\gamma}^{-1}=5\cdot 10^{-13}-1\cdot 10^{-11}.$ For each of these 25 survival probabilities we have generated 100 fake spectra as described in Section 4. We have then classified these new spectra with our classifiers which were trained with the previous datasets based on one-dimensional magnetic fields. The mean predictions found for the classifiers are compared with the previous predictions for spectra simulated with the one-dimensional fields, and three examples for up-scaled, residual, and up-scaled residuals on the AdaBoostClassifier are shown in Figure 12. We find very similar results for all classifiers. For the latter two, we see that in the regime where the classifiers are working at reasonably large couplings and for data with ALPs of large enough couplings, the mean performance is very similar. We cannot identify a large change due to this change in the magnetic field for these two data-products. As the number of magnetic field realisations is relatively low, we are not surprised by the higher oscillatory behaviour in the mean performance.

However, for the residual data we find that the ALP data is tracing the no-ALP mean prediction and not the ALP mean-predictions. The classifiers identify these spectra with ALPs as no-ALP. At this stage, we do not understand the underlying reason for this phenomena. Similarly to the unexpected behaviour of the up-scaled data on low couplings, a detailed investigation is beyond the scope of this article. For now, we simply consider these bounds as demonstrative of the potential of machine learning, rather than giving us true bounds on $g_{a\gamma\gamma}$ .

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. Raffelt and L. Stodolsky, Mixing of the Photon with Low Mass Particles , Phys. Rev. D 37 (1988) 1237.
2[2] Particle Data Group Collaboration, M. Tanabashi et. al. , Review of Particle Physics , Phys. Rev. D 98 (2018), no. 3 030001.
3[3] D. Wouters and P. Brun, Constraints on Axion-like Particles from X-Ray Observations of the Hydra Galaxy Cluster , Astrophys. J. 772 (2013) 44, [ ar Xiv:1304.0989 ].
4[4] M. Berg, J. P. Conlon, F. Day, N. Jennings, S. Krippendorf, A. J. Powell, and M. Rummel, Constraints on Axion-Like Particles from X-ray Observations of NGC 1275 , Astrophys. J. 847 (2017), no. 2 101, [ ar Xiv:1605.0104 ].
5[5] M. C. D. Marsh, H. R. Russell, A. C. Fabian, B. P. Mc Namara, P. Nulsen, and C. S. Reynolds, A New Bound on Axion-Like Particles , JCAP 1712 (2017), no. 12 036, [ ar Xiv:1703.0735 ].
6[6] J. P. Conlon, F. Day, N. Jennings, S. Krippendorf, and M. Rummel, Constraints on Axion-Like Particles from Non-Observation of Spectral Modulations for X-ray Point Sources , JCAP 1707 (2017), no. 07 005, [ ar Xiv:1704.0525 ].
7[7] C. S. Reynolds, M. C. D. Marsh, H. R. Russell, A. C. Fabian, R. N. Smith, and F. Tombesi, Astrophysical limits on very light axion-like particles from Chandra grating spectroscopy of NGC 1275 , ar Xiv:1907.0547 .
8[8] J. P. Conlon, F. Day, N. Jennings, S. Krippendorf, and F. Muia, Projected bounds on AL Ps from Athena , Mon. Not. Roy. Astron. Soc. 473 (2018), no. 4 4932–4936, [ ar Xiv:1707.0017 ].