Application of Machine Learning to the Particle Identification of GAPS
Takuya Wada, Hideyuki Fuke, Yuki Shimizu, and Tetsuya Yoshida

TL;DR
This paper explores a machine learning approach using deep neural networks to improve particle identification in the GAPS experiment, aiming to enhance detection of rare cosmic-ray antiparticles like antideuterons.
Contribution
It introduces a novel machine learning method for particle identification in GAPS, complementing traditional likelihood-based techniques and demonstrating its potential through exploratory results.
Findings
Deep learning shows promise for particle identification in GAPS.
The approach can uncover unknown patterns in antiparticle event data.
Preliminary results suggest improved identification accuracy.
Abstract
GAPS is an international balloon-borne project that contributes to solving the dark-matter mystery through a highly sensitive survey of cosmic-ray antiparticles, especially undiscovered antideuterons. To achieve a sufficient sensitivity to rare antideuterons, a novel particle identification method based on exotic atom capture and decay has been developed. In parallel to utilizing this unique event signature in a conventional likelihood-based event identification scheme, we have begun investigating a complementary approach using a machine learning technique. In this new approach, a deep-learning package is trained on a large amount of input data from simulated antiparticle events through a multi-layered neural network. By applying this unbiased approach, we expect to mine unknown patterns and give feedback to the conventional method. In this paper, we report results from exploratory…
| Layer | Number of nodes | Activation function |
| Input Layer | 11724 | - |
| Hidden 1 | 8000 | ReLU |
| Hidden 2 | 4000 | ReLU |
| Hidden 3 | 2000 | ReLU |
| Hidden 4 | 1000 | ReLU |
| Hidden 5 | 500 | ReLU |
| Hidden 6 | 50 | ReLU |
| Output Layer | 1 | sigmoid |
| Batch size | 320 |
|---|---|
| Epochs | 500 (early stopping : ON) |
| Optimizer | Adam |
| Learning rate | 0.00001 |
| Loss function | Binary Crossentropy |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDark Matter and Cosmic Phenomena · Particle physics theoretical and experimental studies · Neutrino Physics Research
\bookvolume
17\bookissue20\titleheadertrue\thanksOrgAoyama Gakuin University, Sagamihara, Japan \thanksOrgInstitute of Space and Astronautical Science, JAXA, Sagamihara, Japan \thanksOrgKanagawa University, Yokohama, Japan
Application of Machine Learning to the Particle Identification of GAPS
\NAMETakuyaWADA
\thanksNum1),2)\[email protected] \NAMEHideyukiFUKE
\thanksNum2) \NAMEYukiSHIMIZU
\thanksNum3) and \NAMETetsuyaYOSHIDA\thanksNum1),2)
(2020)
Abstract
GAPS is an international balloon-borne project that contributes to solving the dark-matter mystery through a highly sensitive survey of cosmic-ray antiparticles, especially undiscovered antideuterons. To achieve a sufficient sensitivity to rare antideuterons, a novel particle identification method based on exotic atom capture and decay has been developed. In parallel to utilizing this unique event signature in a conventional likelihood-based event identification scheme, we have begun investigating a complementary approach using a machine learning technique. In this new approach, a deep-learning package is trained on a large amount of input data from simulated antiparticle events through a multi-layered neural network. By applying this unbiased approach, we expect to mine unknown patterns and give feedback to the conventional method. In this paper, we report results from exploratory investigations that illustrate the promise of this new approach.
keywords:
Deep Learning, Artificial Neural Network, Particle Identification, Cosmic Ray, Balloon Experiment
1 Introduction
The origin of dark matter (DM) is a major subject for modern physics. Although the nature of DM has not yet been revealed directly, the existence of DM, which accounts for around a quarter of the total energy density of the universe, is strongly supported by many astronomical observations and theoretical calculations.[1] A leading class of DM candidate particles is the weakly interacting massive particle (WIMP). A number of experiments have been carried out to detect DM either directly, indirectly, or using a particle accelerator. To verify a wide variety of theoretical DM models, it is important to investigate DM from diverse complementary angles.
Cosmic-ray antideuterons are expected to provide a new approach to indirectly detect DM.[2] Antideuterons can be produced by self-annihilation or decay of WIMP DM particles, in common with the other indirect probes such as gamma rays, positrons, and antiprotons. In contrast to all these probes, the flux of DM-produced antideuterons can be orders of magnitude above the astrophysical backgrounds (originating from the secondary interactions of cosmic rays) whose abundance is kinematically suppressed in the sub-GeV low-energy region.[3, 4] Therefore, the detection of even a single sub-GeV antideuteron can provide evidence of a novel origin. Figure 1 shows representative antideuteron spectra predicted from DM models such as the lightest supersymmetric particle (LSP) neutralino,[5] right-handed Kaluza-Klein neutrino of warped 5-dimensional grand unified theories (LZP),[6] and decaying LSP gravitino.[7] Among these models, the DM with several tens of GeV mass has been recently discussed as a possible source to interpret observed excesses of cosmic-ray antiprotons,[8] and gamma rays.[9] Antideuterons are still almost unexplored and have never been detected in the cosmic radiation[12, 13]. Hence, low-energy cosmic-ray antideuterons have a wide discovery space to detect DM.
2 GAPS Project
The General AntiParticle Spectrometer (GAPS) is an international project to contribute to dark matter physics through a highly sensitive survey of cosmic-ray antiparticles.[14, 15, 16] The primary goal of GAPS is to search for undiscovered antideuterons in the low-energy range (0.25 GeV/neucleon) with an unprecedented sensitivity. To achieve a high sensitivity, GAPS plans to fly a large-grasp instrument over Antarctica multiple times by using NASA long-duration balloons (LDBs). The polar balloon flight is optimal for GAPS, not only because long observation times (1 month) can be realized at high altitudes, but also because of the low rigidity cutoff near the geomagnetic pole, which allows us to observe charged cosmic rays directly in the low-rigidity range below 0.5 GV. These low energies are highly suppressed on the orbit of the International Space Station (ISS). The first GAPS LDB flight is planned for late 2021. The GAPS antideuteron sensitivity expected in three LDB flights is shown by Fig. 1.[10]
GAPS will also provide a precise measurement of the antiproton flux around 100 MeV, a region that is particularly sensitive to low-mass DM models.[17] GAPS will detect more than an order of magnitude more antiprotons in the low-energy range compared to previous experiments such as BESS-Polar[18] and PAMELA.[19] Precise measurement in this lowest energy range offers new phase space for probing light DM models, such as light neutralinos, gravitinos, and LZPs. GAPS is also sensitive to antihelium,[20] which is another new probe into the DM physics. However, antihelium is outside the scope of this paper.
2.1 Detection concept
To observe rare antiparticles among high cosmic-ray backgrounds, it is essential to survey antiparticles with a large grasp instrument also with high identification capabilities. For instance, typical fluxes of protons and antiprotons in the cosmic radiation are approximately and higher, respectively, than the antideuteron flux predicted in Fig. 1.
To realize good identification capability against these backgrounds while keeping a large geometrical acceptance, GAPS introduces an original method that utilizes the deexcitation sequence of exotic atoms.[21, 22]
Figure 2 shows a conceptual diagram of the GAPS instrument configuration. A central tracker composed of over 1000 custom lithium-drifted silicon (Si(Li)) detectors is surrounded by a time-of-flight (TOF) system. Figure 3 shows a conceptual diagram of the GAPS antiparticle identification method. When an antiparticle arrives from space, it is slowed down by the energy losses in the residual atmosphere, in the GAPS TOF counters, and in Si(Li) tracker as the target material. Just after stopping in the target, the antiparticle forms an exotic atom in an excited state with near unity probability. Then, through radiative transitions in the cascade to the ground state, the exotic atom deexcites with the emission of characteristic X-rays. The energies of the ladder X-rays are strictly determined by the exotic atom physics and thus provide a key to identify the incoming antiparticle species. After the X-ray emission, the antiparticle annihilates in the nucleus, emitting a characteristic number of pions and protons, which provides additional particle identification information. Tracks of X-rays and pions or protons with a vertex, in combination with other measured values such as the time-of-flight (or the velocity), the energy deposits, and the stopping depth, enables us to distinguish rare antideuterons from backgrounds including antiprotons and protons. Without the technical limitations of heavy magnets in conventional magnetic spectrometers, this technique allows us to build an instrument with a large grasp and low-energy range. The principle of this particle identification technique was verified by accelerator tests with various target materials using the KEK antiproton beam-line.[23, 24]
2.2 Instrument design
The central tracker shown in Fig. 2 consists of 1000 Si(Li) detectors arrayed in 10 layers with 10 cm vertical spacing in a 1.6 m 1.6 m 1 m volume. Each Si(Li) wafer has 4-inch diameter and 2.5 mm thickness and is segmented into 8 strips.[25, 28, 26, 27] The Si(Li) detector serves as a degrader, a depth sensing detector, a stopping target to form an exotic atom, an X-ray spectrometer and a charged particle tracker. In order to distinguish antideuteronic X-rays from antiprotonic X-rays, the energy resolution for X-rays should be better than 4 keV, which is achievable at operating temperatures of -40∘C.
The TOF system is composed of the inner and outer scintillation counters (200 counters total). Each counter consists of thin (6 mm thick) and long (180 cm) plastic scintillator paddles whose both ends are coupled to six silicon photomultipliers (SiPMs) each. The TOF system generates the trigger signal, measures the time-of-flight, measures the energy deposit, roughly determines the arrival direction, and works as a pion/proton detector. The experimentally determined time resolution is 0.4 ns.
The basic GAPS payload design concept was successfully verified during a balloon flight in June 2012 at Taiki, Japan. Prototypes of all GAPS key components were mounted on the payload. Recording more than one million events during the flight, it was confirmed that the components including the Si(Li) detectors and TOF counters operated as expected.[29, 30, 31, 32]
3 Approaches to Particle Identification
Rigorous particle identification and background suppression are necessary, especially to distinguish antideuterons from backgrounds. The conventional particle identification method being developed is based on the reconstruction of each event including the incoming particle, secondary multiple particles (pions and protons), and characteristic X-rays.[10, 33] For the reconstruction of each event, a data set of about channels (8 strips of a thousand Si(Li) detectors each and both ends readout of 200 TOF counters each) are used. From each channel, energy depositions information can be provided. The TOF counters provide hit timing information, too. During the event reconstruction, key physics parameters for the particle identification are obtained. Among the backgrounds, antiprotons are considered to be the major background for antideuterons rather than the more numerous protons, because only antiparticles can form an exotic atom and fake antideuteronic signals in the GAPS identification method.[21] From baseline studies, a sufficient background suppression capability is expected from a combination of the reconstructed physical parameters even against the antiprotons.[10] However, it is still challenging to establish the details of the particle identification method, because of the complexity of a many-channel analysis, the large variety of expected signal patterns, and the required high identification capabilities.
Therefore, in parallel to the development of conventional particle identification methods, we have begun investigating a complementary approach using a machine learning technique. Machine learning is a subset of artificial intelligence (AI) that has found application recently in a wide variety of fields, in particular thanks to its high potential for pattern recognition.[34] In our new approach, a deep-learning algorithm builds a mathematical model by “learning” on copious training data of simulated antiparticle events through a multi-layered neural network (NN) without any physical interpretations. By applying this fully unbiased approach, we aim to validate the conventional method and to mine unknown patterns. This will provide positive feedback to the conventional event identification method, further improving the GAPS antiparticle detection sensitivities. Hereafter, exploratory investigations of the deep-learning approach are discussed.
4 Particle Identification by Machine Learning
4.1 Deep learning
Deep learning is a machine-learning algorithm of deep NN. NN, or artificial NN, is a mathematical framework based on a collection of interconnected artificial neurons which models biological neural networks in brains. As shown by Fig. 4, an NN is composed of an input layer, an output layer, and in-between hidden layers. Artificial neurons, or nodes, in each neighboring layer are interconnected by edges. Each connection is assigned a weight to adjust the connection strength. Each layer has a nonlinear activation function to compute summation outputs from the weighted inputs. The outputs produced from a layer are then inputted to nodes in the next layer. In this manner, the input values are transferred to the final layer to compute the output. The consistency between the input and the output is evaluated by a loss function. Through iterative learning processes the weights are modified so that the consistency is improved. As a result, a learning model can be obtained which produces a favored output from a given input. By using a multi-layer (or deep) NN, the recognition accuracy can be drastically improved.
4.2 Input data
In this study, we used 204 energy depositions from the TOF counters (one from each counters) and 11,520 energy depositions from the Si(Li) detectors (8 strips from 1,440 detectors). The input data were generated using a Monte-Carlo simulation code[33] developed by the GAPS collaboration based on the GEANT4 framework[35] version 10.4. In this study, we used the energy deposition values calculated by the GEANT4 code, without taking into account resolution effects. Characteristic X-rays from exotic atoms, which will be included in future GEANT4 versions, are also not taken into account in this study. Furthermore, timing measurement information is not considered.
In each data set, an antideuteron or an antiproton was injected from the same position in the center above the instrument with a vertically-downward fixed incident angle (Fig. 5). To simplify things, the velocity, , of the incident antiparticle was limited to two narrow ranges; the was uniformly distributed by random numbers either within 0.335<$$\beta_{1}$$<0.340 or 0.250<$$\beta_{2}$$<0.255. In this study, we discuss (i) distinguishing between antideuterons and antiprotons with similar velocities of and (ii) distinguishing between antideuterons with different velocities of and .
200,000 events data sets were prepared with the simulation code for each combination case of an antiparticle species and a velocity range. As the input of training data for the supervised learning, 160,000 labeled data sets were used for each case. The rest of 40,000 data sets were used without a label as the validation data. These number of events were limited by the amount of computing power used in this study but are be sufficient for this first-step study to explore the feasibility of our deep-learning approach.
4.3 Learning model framework
We used the deep-learning framework Keras,[36] which is a Python-based open-source NN library, and a backend of TensorFlow[37] which is supported by Keras. Table 1 summarizes the outline of the network structure used. All hidden layers were fully connected.
As is common, a Rectified Linear Unit (ReLU or ramp function) was used as the activation function in the hidden layers. ReLU, which is expressed by Eq. (1), outputs zero for negative inputs and equals input for non-negative inputs.
[TABLE]
The sigmoid function used in the output layer is suitable for binary classifications like our case studies. The sigmoid function, defined by Eq. (2), computes the likelihood in a range from 0 to 1 to estimate to which class the input data should be classified.
[TABLE]
Table 2 summarizes major hyperparameters used in this study.
Hyperparameters are the parameters that must be set before starting the learning process and are essential to control the learning algorithm. Batch size defines the number of data sets used in one iteration. Here we used the mini-batch mode so that the entire training data are divided into subsets defined by the batch size. The number of epochs defines the number of times that the learning algorithm will pass through the entire training data set. As the optimizer, which defines how to update the network weights during the iterative learning, the common gradient-based optimizer Adam[38] was used. The learning rate defines how strongly the optimizer updates the weights, and thus controls the convergent behavior of the learning model. As the loss function we used binary cross-entropy, which is suitable to binary classifications.
The network structure and hyperparameters shown by Tables 1 and 2 should be optimized for each learning case. Inadequate hyperparameters can result in an inaccurate model such as an overfitted model, which contains more parameters than can be justified by the data. As an option to avoid the overfitting, we incorporated the early-stopping function which terminates the learning process when the improvement of the learning accuracy saturates before reaching the number of epochs. In addition, we implement the dropout function, which is another technique to avoid the overfitting. The dropout function lowers the degree of freedom of the network by deactivating some nodes in a layer with a certain probability and improves the generalization performance. In this study, the dropout function was applied to the input layer and all hidden layers. The dropout ratio was set to 20% in the input layer and 50% in each hidden layer. Hyperparameters used in this exploratory study were tentatively chosen among various sets of values so as to achieve the highest learning accuracy. The NN with these above hyperparamaters was commonly used for both cases of (i) and (ii) but the learning model was trained for each case independently.
5 Results
5.1 Convergence in iterative learning
Due to the sigmoid function (Eq. (2)), the output from each input event has an output between 0 and 1. Given a binary-classification boundary threshold of , each output can be judged as to which class the computed likelihood belongs. As an example, in the case that classes of “A” and “B” are tagged by the values of 1 and 0, respectively, outputs from a class-A input with a value larger than and outputs from a class-B input with a value smaller than are recognized correctly. In this manner, the recognition efficiency, , and the misidentification probability, , can be calculated as follows:
[TABLE]
[TABLE]
[TABLE]
[TABLE]
A commonly used index of accuracy is defined by the total rate of correct outputs with = 0.5:
[TABLE]
As an example of the learning curve, Fig. 6 shows the accuracy profile in the case of distinguishing antiprotons and antideuterons with similar velocities of . The high accuracies at the first few epochs validate that the hyperparameters were set appropriately. Accuracies both of the training and validation data are improved by the iterative learning and converge to 1. This confirms that the overfitting is successfully avoided. In every case, we confirmed the convergence in this manner. The accuracy of the training data in this plot is lower than that of the validation data because the training-data accuracy is underestimated due to the dropout function.
For discussions in the following section, here we also define the rejection power, , as follows;
[TABLE]
[TABLE]
5.2 Distinguishing between antideuteron and antiproton with a similar velocity
Figure 7 shows the output likelihood distributions calculated from the validation data in the case of distinguishing antideuterons and antiprotons with velocities of . Most of antideuterons (red) and antiprotons (blue) are correctly recognized. By varying the threshold , the of true input can be plotted.
Figure 88 shows the relation between the of antideuteron and of antiproton. The reaches above while keeping a high of 98%.
In the inverse case of recognizing antiproton, as shown by Fig. 88, rejection power well above is achieved while keeping a high identification efficiency of 99%.
5.3 Distinguishing between antideuterons in two velocity ranges
Figure 9 shows the as functions of the in the case of distinguishing antideuterons with velocities of 0.335<$$\beta_{1}$$<0.340 and 0.250<$$\beta_{2}$$<0.255. High rejection powers above are achieved while keeping a high identification efficiency of 99%.
5.4 Discussions
From the results, in each case, high rejection powers of \sim$$10^{3} are achieved while keeping high identification efficiencies above 98%. This indicates the potential of the deep-learning approach to study the particle identification capability of the GAPS instrument. Indeed, in this study, the highest rejection power of \sim$$10^{4} achieved by Figs. 8 and 9 are limited by the number of validation data of \sim$$10^{4}. By increasing the number of both the training and validation data, the learning accuracy will be increased.
In this study, characteristic X-rays[10] and timing measurements are not included in the simulated data. By implementing these information, the distinguishing accuracy must be further increased.
6 Conclusion
We have begun a study using up-to-date machine learning techniques for the GAPS particle identification. These exploratory investigations indicate that this new approach can achieve a high recognition efficiency while keeping a sufficient rejection power even without using timing and characteristic information. Based on these encouraging first-step results, we will proceed full-scale studies under more realistic conditions; for instance, we will randomize incident positions, angles, and velocities of the incoming particles, input values with finite measurement resolutions, and increase statistics of both training and validation data. Thereby expanding and optimizing of the deep learning neural network will be pursued. Also, mining of unknown patterns and their feedback to the conventional identification method will be studied.
Acknowledgments
This work is partly supported by Grants-in-aid KAKENHI (JP17H01136, JP19H05198) and the Sumitomo Foundation fiscal 2018 grant for basic science research projects.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Klasen, M., Pohl, M., and Sigl, G.: Indirect and Direct Search for Dark Matter, Prog. in Particle and Nucl. Phys. , 85 (2015), pp.1–32.
- 2[2] Aramaki, T., Boggs, S., Bufalino, S., Dal, L., Doetinchem, v. P., Donate, F., et al.: Review of the Theoretical and Experimental Status of Dark Matter Identification with Cosmic-ray Antideuterons, Phys. Rep. , 618 (2016) pp. 1–37.
- 3[3] Donato, F., Fornengo, N., and Salati, P.: Antideuterons as a Signature of Supersymmetric Dark Matter, Phys. Rev. D , 62 (2000), 043003.
- 4[4] Ibarra, A. and Wild, S.: Determination of the Cosmic Antideuteron Flux in a Monte Carlo Approach, Phys. Rev. D , 88 (2013), 023014.
- 5[5] Donato, F., Fornengo, N., and Maurin, D.: Antideuteron Fluxes from Dark Matter Annihilation in Diffusion models, Phys. Rev. D , 78 (2008), 043506.
- 6[6] Baer H. and Profumo, S.: Low Energy Antideuterons: Shedding Light on Dark Matter, J. of Cosmology and Astroparticle Phys. , 0512 (2005), 008.
- 7[7] Dal, A. L. and Raklev, R. A.: Antideuteron Limits on Decaying Dark Matter with a Tuned Formation Model, Phys. Rev. D , 89 (2014), 103504.
- 8[8] Cui, M.Y., Yuan, Q. Tsai, S. Y.K., and Fan, Y.Z.: Possible Dark Matter Annihilation Signal in the AMS-02 Antiproton Data, Phys. Rev. Lett. , 118 (2017) 191101.
