Classification and Recovery of Radio Signals from Cosmic Ray Induced Air   Showers with Deep Learning

M. Erdmann; F. Schlueter; R. Smida

arXiv:1901.04079·astro-ph.IM·May 22, 2019

Classification and Recovery of Radio Signals from Cosmic Ray Induced Air Showers with Deep Learning

M. Erdmann, F. Schlueter, R. Smida

PDF

TL;DR

This paper demonstrates deep learning methods to classify and recover cosmic ray air shower radio signals from noisy broadband data, achieving high accuracy and energy resolution in simulated environments.

Contribution

It introduces two deep learning approaches for classifying and cleaning radio signals from cosmic ray air showers, improving detection accuracy and signal reconstruction.

Findings

01

90% true positive rate for signals with SNR > 3

02

20% energy resolution without bias for 80% of signals

03

Effective removal of radio frequency interference from signals

Abstract

Radio emission from air showers enables measurements of cosmic particle kinematics and identity. The radio signals are detected in broadband Megahertz antennas among continuous background noise. We present two deep learning concepts and their performance when applied to simulated data. The first network classifies time traces as signal or background. We achieve a true positive rate of about 90% for signal-to-noise ratios larger than three with a false positive rate below 0.2%. The other network is used to clean the time trace from background and to recover the radio time trace originating from an air shower. Here we achieve a resolution in the energy contained in the trace of about 20% without a bias for $80%$ of the traces with a signal. The obtained frequency spectrum is cleaned from signals of radio frequency interference and shows the expected shape.

Equations8

SNR = \frac{max. Signal}{RMS _{Noise}} = \frac{A _{max}^{S}}{\frac{1}{N} \sum _{i}^{N} A _{i}^{2}},

SNR = \frac{max. Signal}{RMS _{Noise}} = \frac{A _{max}^{S}}{\frac{1}{N} \sum _{i}^{N} A _{i}^{2}},

E_{signal} = \frac{Δ t}{R \cdot e} (t_{1} \sum t_{2} U_{i}^{2} - \frac{t _{2} - t _{1}}{t _{4} - t _{3}} t_{3} \sum t_{4} U_{i}^{2})

E_{signal} = \frac{Δ t}{R \cdot e} (t_{1} \sum t_{2} U_{i}^{2} - \frac{t _{2} - t _{1}}{t _{4} - t _{3}} t_{3} \sum t_{4} U_{i}^{2})

\frac{Δ E _{i}}{E _{true}} = \frac{E _{i} - E _{true}}{E _{true}},

\frac{Δ E _{i}}{E _{true}} = \frac{E _{i} - E _{true}}{E _{true}},

\frac{Δ I}{I _{true}} = \frac{\int _{30 MHz}^{80 MHz} ∣ F - F _{true} ∣}{\int _{30 MHz}^{80 MHz} F _{true}},

\frac{Δ I}{I _{true}} = \frac{\int _{30 MHz}^{80 MHz} ∣ F - F _{true} ∣}{\int _{30 MHz}^{80 MHz} F _{true}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Classification and Recovery of Radio Signals from Cosmic Ray Induced Air Showers with Deep Learning

M. Erdmann

F. Schlüter111Now at Karlsruhe Institute of Technology, Germany

and R. Šmída222Now at Kavli Institute for Cosmological Physics and the Enrico Fermi Institute, The University of Chicago, 5640 S. Ellis Ave, Chicago, Il 60637, USA

Abstract

Radio emission from air showers enables measurements of cosmic particle kinematics and identity. The radio signals are detected in broadband Megahertz antennas among continuous background noise. We present two deep learning concepts and their performance when applied to simulated data. The first network classifies time traces as signal or background. We achieve a true positive rate of about $90\%$ for signal-to-noise ratios larger than three with a false positive rate below $0.2\%$ . The other network is used to clean the time trace from background and to recover the radio time trace originating from an air shower. Here we achieve a resolution in the energy contained in the trace of about $20\%$ without a bias for $80\%$ of the traces with a signal. The obtained frequency spectrum is cleaned from signals of radio frequency interference and shows the expected shape.

1 Introduction

In modern experimental setups, sensors continuously convert physical signals to electric charge. Continuous data analysis is required to determine whether data should be saved or discarded. Ideally, an analysis will recover a physical signal immediately after being recorded by the sensor, thereby minimizing the bandwidth required for data transfer. To distinguish the desired signal information from background noise, this live data analysis needs to be fast, have high signal selection efficiency, and a low rate of false positive decisions on background.

In this work, we investigate strategies for solving such challenges with deep learning techniques. These methods are based on neural networks with a substantial number of adjustable parameters to accommodate, for example, analyses of signal shapes. We take advantage of several network architectures and methods developed in the field of computer science which have been shown to be capable of handling millions of parameters [1, 2, 3, 4, 5]. For a review on deep learning techniques refer to [6]. Recent applications in various particle and astroparticle research projects have demonstrated advantages when using deep learning methods [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20].

Our example applications focus on radio signals emitted by ultra-high energy cosmic rays (UHECRs) which initiate extensive particle showers in the atmosphere. UHECRs are likely protons and nuclei with energies extending from $10^{18}$ eV to above $10^{20}$ eV. The UHECR flux decreases from $63$ (km2 yr sr)-1 at $10^{18}$ eV to only $10^{-3}$ (km2 yr sr)-1 at $10^{20}$ eV [21, 22]. Due to their electric charge and propagation through interstellar and intergalactic magnetic fields, no point source has been identified so far. To overcome these challenges, one looks for precise, reliable and low-cost measurement techniques, capable of achieving exposures well above ten thousand km2 sr yr. By accumulating sufficient statistics and having a precise reconstruction of a UHECR’s arrival direction, energy and mass composition, one can search for an excess of arrival directions around selected astronomical objects in the sky, with an angular offset related to the particle’s electric charge. Another approach is to search for ultra-high energy neutral particles, like photons and neutrinos, and use them for the identification of point sources. In both cases, one would need a solid triggering system followed by precise reconstruction of measured air showers.

UHECRs are measured through the cascade of secondary particles they produce upon interaction with the earth’s atmosphere. This “extensive air shower” can cover a ground area of a few tens of km2, and its electromagnetic component causes the emission of both fluorescence light along its trajectory, along with coherent radio signals. The coherent radio emission of shower particles leads to a transient radio signal with a duration of approximately $10^{-8}$ s. Such signals have been measured using broadband antennas, typically covering the frequency range $30-80$ MHz [23, 24, 25, 26, 27, 28, 29], for corresponding cosmic-ray energies above $10^{17}$ eV [30, 31, 32, 33, 34]. It has been proven that air shower signals can also be measured in the GHz frequency range [35]. For reviews on radio detection techniques refer to [36, 37].

The radio signal amplitude has been observed to increase quadratically with cosmic ray energy, and to decrease with the distance from the shower axis [30, 34, 38]. The shape of the frequency spectrum of a radio signal exhibits a dependence on the height of the maximum of the particle shower development in the atmosphere [39]. Therefore, both the signal magnitude and the frequency spectrum contain important information that needs to be recovered.

An irreducible background is present due to both Galactic radio signals and a range of terrestrial sources of radio emission. The background rate exceeds the cosmic ray rate by several orders of magnitude, which is of order $1$ Hz/km2 above $\sim 10^{16}$ eV. In principle, the background rate can be reduced by a discriminator threshold used in a measuring sensor’s output, but a trade-off would then then arise between background reduction and detection of low-amplitude radio signals from either low energy or distant cosmic ray showers. This challenging regime will be the primary interest in our investigations below.

Our objectives in this paper are twofold. First, we present a method to select cosmic ray induced radio signals while rejecting a large fraction of the radio background. For this classification task, a convolutional neural network is used. Second, we study a method to disentangle the cosmic ray radio signal from the simultaneously recorded noise. The aim is to conserve the radio signal with its original magnitude and frequency spectrum as accurately as possible. For this regression task, we use a bottleneck-like architecture similar to the de-noising autoencoder [6] and we train it in a supervised manner. Similar independent work is described in [40, 41].

The data sets used for training and testing network models are described in section 2. Details of our classification and regression architectures are then provided in sections 3 and 4, respectively. Finally, we present our conclusions in section 5.

2 Data sets of radio signals and noise

To train and test deep learning models we use large data sets of simulated radio signals and noise. Here we provide further details on the dataset and software.

2.1 High-quality dataset of radio signals

In order to have a high-quality dataset of radio signals we have adopted simulations made for the Auger Engineering Radio Array (AERA). AERA is a system of radio antennas installed at the Pierre Auger Observatory [42], measuring pulses of a few nanoseconds in length emitted by cosmic ray air showers with energies above $10^{17}$ eV. More than $150$ autonomous antenna stations are spread over $17$ km2 at various distances from each other. Each antenna consists of two horizontally polarized antenna systems, and is sensitive to frequencies from $30$ to $80$ MHz, with signal processing algorithms and electronics specifically developed for this purpose. Radio pulses are sampled at $180$ MHz and stored in traces of 1000 time bins in length. For this study we have chosen to analyze signals from AERA’s logarithmic periodic dipole antennas (LPDAs).

Air showers were simulated using the CoREAS code [43]. CoREAS is a CORSIKA-based [44] program, suitable for simulating the radio emission from air showers. It relies on the shower’s particle content as simulated by CORSIKA, and calculates emitted electromagnetic radiation for each charged particle via the endpoint formalism [45]. The showers were generated with the following parameters: zenith angle $\theta\in[0^{\circ},62^{\circ}]$ randomly sampled from a $\sin(\theta)\cos(\theta)$ distribution, azimuth angle $\phi$ uniformly distributed from $0^{\circ}$ to $360^{\circ}$ , energy sampled uniformly in $\log_{10}(E)$ from $10^{17}$ to $10^{19}$ eV, and a randomly chosen shower core location.

The simulated air showers were further processed by the Pierre Auger Observatory’s reconstruction pipeline, $\overline{\rm Off}$ $\underline{\rm line}$ [46]. The signal was folded with the antenna response and converted to a voltage time trace. We have analysed the two station polarizations, East-West and North-South, independently in this work.

2.2 Signal-to-noise ratio

We characterize the strength of a signal in a "noisy" trace using the signal-to-noise ratio (SNR). The SNR is defined as the maximum amplitude of a signal divided by the root mean square (RMS) of the noise in the time trace

[TABLE]

where $A_{i}$ represents the voltage in the $i$ -th time bin of a trace of length $N$ bins, and $A_{\text{max}}^{S}$ is the maximum voltage of the signal. In the signal recovery task, the trace is divided into a signal and noise region and $A_{\text{max}}^{S}$ is calculated for the signal region, while the RMS noise is obtained from the noise region only. In the classification case, $A_{\text{max}}^{S}$ is determined from the true signal trace.

Let us note that the Pierre Auger Collaboration uses a different definition of the SNR for the AERA data, in which Eq. 2.1 is squared and convolved with a Hilbert envelope to calculate the maximum amplitude. This procedure increases the maximum amplitude of a signal by $\sim 7\%$ . The AERA data are typically reconstructed above $\text{SNR}_{\text{AERA}}=10$ , which corresponds to $\text{SNR}\simeq\sqrt{10}\simeq 3.2$ in our case.

In addition, air showers measured with AERA are externally triggered by the surface detector of the Pierre Auger Observatory [42]. The SNR is determined offline for each station-level signal. The signal is the sum of the electric field traces from all three polarizations: East-West, North-South, and vertical. The electric field trace is determined by unfolding the known antenna response. In our case, we examine the signal of each polarization independently, and the measured signal considered in the voltage trace includes antenna effects such as dispersion. Therefore, the signal has a lower SNR in our study when compared to the standard AERA analysis.

2.3 Simulated data and noise

We have prepared a code capable of generating various noise traces. This code provides maximal flexibility, as any input parameter can be modified and any radio component can be included. In this way we can debug and train our model on the Monte Carlo generated noise data by mimicking realistic radio background conditions at any site and using any electronic readout system.

Noise is simulated in frequency space and consists of two uncorrelated components of the same amplitude. The first component is white noise sampled from the normal distribution, i.e. voltage $V(f)=\mathrm{const.}$ , where $f$ is frequency. The second component is colored noise, $V(f)\propto f^{-\alpha/2}$ , where the power density index $\alpha$ takes a random value between [math] and $1$ drawn from a uniform distribution. These two components mimic various sources of noise, and their sum agrees well with measurements (see e.g. [47]). In addition, any radio frequency interference (RFI) source emitting at a given frequency, or a transient source, can be included.

The last step is passing the sum of both noise components through a band-pass filter. We use the frequency band between $30$ and $80$ MHz to mimic the configuration of an AERA LPDA antenna. For the band-pass filter we use a finite impulse response (FIR) filter implemented in the SciPy package.

Data for classification

We normalize each generated time trace by dividing each bin by the standard deviation of the whole trace, before using it in a deep neural network. Normalization is known to improve the learning efficiency of a neural network. To study the performance of deep neural networks for different values of the SNR, we scale simulated signals relative to the standard deviation of the noise amplitude. For the classification study, the SNR ranges from $0.5$ to $5.0$ to cover an interesting range of signals for triggering. Low SNR values will show what fraction of weak signals can be identified while avoiding an excessive number of false positives. The upper bound of SNR values will be used to verify that a neural network correctly identifies large air shower signals. $\text{SNR}\,>\,3$ is typically used for the reconstruction of air showers (see 2.2).

We show an example of a simulated radio signal superimposed with simulated noise in Fig. 1. The air shower radio signal in this example has SNR= $1.5$ . Both the time trace and spectrum include the air shower signal and the sum of the signal and noise. The air shower signal exhibits a falling frequency spectrum, but the exact shape depends on the geometry of the shower.

Data for signal recovery

To present a realistic test scenario, we created a data set with a natural distribution of SNRs, with no scaling performed to achieve a specific SNR. The resulting SNR distribution peaks at 3.8 (median). On the other hand, SNR values for pure noise have a mean $\langle\text{SNR}\rangle=2.7$ and standard deviation of $\sigma_{\textrm{SNR}}=0.5$ , if we use the maximum noise amplitude in a random signal time window for $A_{\text{max}}^{S}$ in Eq. 2.1.

To ensure that the network learns the shape of air shower signals, we consider only traces with the true maximum signal amplitude exceeding $30\%$ of the noise level (represented by the Root-Mean-Square, RMS) in a time window outside the signal region.

All traces are normalized to a maximum absolute amplitude of $1$ prior to their input to the network. Large differences in the signal strength are compensated with this normalization, and distinct features are retained for low signals. The network is trained on $69,967$ traces, while $7,775$ traces are used for validation. The network is then evaluated on an independent test set of $5,376$ traces with a SNR above $\sqrt{10}\approx 3.2$ . We want to point out that no cut on the SNR was applied at either the training or validation stage.

The position of a true signal is identified by the maximum signal amplitude, and appears in a $2.2\mu\textrm{s}$ window within a trace. With a typical signal width of several hundred nanoseconds, this window mimics a signal search window for triggered data. This window is much longer than a typical electric field pulse of an air shower, which lasts only a few tens of ns, as the measured voltage is extended by dispersion effects in an antenna.

3 Classification of signal and background events

The primary challenge of an array of self-triggering antennas measuring cosmic rays resides in discriminating a signal from background in a continuous time trace. Our goal is to develop a deep neural network with efficient signal identification capabilities and strong background rejection.

3.1 Basic considerations

In our classification task, we analyze traces of $5.6\,\mu$ s with $1,000$ bins, as is the case for AERA antennas. We require a trigger rate below $200$ Hz, while the false positive rate (FPR) must be below $0.1\%$ . Therefore, we will search for neural network models having less than $20$ false triggers in the $20,000$ traces used for testing. Models will then be evaluated by their true positive rate (TPR), i.e. the percentage of correctly identified air shower events in the test data.

The SNR can be viewed as the primary obstacle to be overcome by a neural network. Therefore, our basic strategy is to train and test models for eight different values of the SNR, ranging from $0.5$ to $5$ . In every training and testing cycle, the SNR value was kept constant.

3.2 Network concept and training

A neural network takes $1,000$ bins of a time trace as its input and delivers a binary output to the question "Is there a signal in this trace?". TensorFlow [48] was used in conjunction with the Keras interface [49] to set up and train our networks.

Initially, a search for suitable network architectures was performed, including extensive training and testing of network models together with a random search for optimal hyper-parameters. Inspired by the paper [50], where a convolutional neural network (CNN) was used to search for gravitational-wave signals in noisy data, we began exploring the same architecture. Nevertheless, two other sequential neural network architectures were compared to the CNN one. The first of these, long short-term memory (LSTM), required significantly more computational resources than CNN, and the results of the second architecture using only fully-connected (Dense) layers was not superior to CNN. Therefore, we decided not to pursue either of these architectures.

Architecture

CNN has two components, convolutional (Conv) layers responsible for feature extraction, and Dense layers serving as a classifier. We tested various configurations and combinations of these layers, and our findings are described below. The number of hyper-parameters in tested models ranged from ten thousand to several million, and a random initial weight was supplied for each hyper-parameter. In addition, the number of filters in Conv layers, and the dimension of Dense layers, were drawn as $2^{n}$ , where $n$ is a random number between one and eight.

During our testing, we determined that results obtained from models with two Conv layers were better than those obtained from models using only one Conv layer. The next step was separating Conv layers into either two or three blocks with non-linearity and pooling layers following each block. On average, models with two blocks of layers outperformed models with only one block, and no improvement was found for models with three blocks. In addition, models with three Dense layers led to more correct results than those with one, two, or four layers. If the performance of two models was similar, we chose the simpler of the two.

An overview of the best performing network model is shown in Fig. 2. It consists of four one-dimensional convolution layers (Conv1D) separated into two blocks, and three fully-connected (Dense) layers. The initialization function draws samples from a truncated normal distribution centered on zero, called He_normal [51], and the Rectified Linear Unit (ReLU) activation function is used. The stride length of the convolution is unity. The ReLU activation function is used in all but the final layer. In the latter, the softmax function is adopted and works as a classifier. The model is trained with the Adam optimizer [52], and the binary cross-entropy loss function.

The first two Conv1D layers have $256$ filter layers (i.e. the number of output filters in the convolution) and a kernel size of $5$ , which specifies the length of the 1D convolution window. The second two Conv1D layers have their filter size reduced to $32$ , with the number of filter layers being the same as that of the previous two layers. Batch normalization is performed after each convolution layer. The maximum pooling layer is used after the first block of convolution layers and the flattening layer after the second block in order to get the data into a format suitable for the following fully-connected layer (see Fig. 2). This network has approximately $766,000$ trainable parameters.

Training

We trained models using $100,000$ simulated traces for each SNR value. The training data set contained $50\%$ traces with a signal, with the other $50\%$ being traces with background only. An independent data set with $20,000$ traces was used to test the trained models. In this verification dataset, only $1\%$ of traces contained a signal, with this fraction being closer to realistic conditions in air shower experiments. The maximum of a radio signal was located in a random bin and was always present in a trace, but the same did not hold for other parts of the signal.

We found that learning did not improve after only a few epochs, as was found also for the signal recovery task (see Fig. 5), and this indicated sufficient training of models with the provided training datasets. We therefore decided to use only a single epoch in order to prevent over-training and reduce computation time. This approach was considered sufficient for the purpose of the classification task, as only Monte Carlo generated traces were used.

Training and testing were performed for SNR values between $0.5$ and $5$ . About ten thousand training and test runs were done for each SNR value, allowing a sufficient scan of the hyper-parameters. The objective of our task was to increase the TPR while keeping the FPR below $0.1\%$ . The best performing models were then saved for each SNR. We used the VISPA computing cluster (using NVIDIA 1080 GTX cards) at the RWTH Aachen University [54, 55, 56, 57, 58] for our calculations.

3.3 Classification results and working point

We performed several checks of the best network models, and our results are summarized in Fig. 3. The first step was a search for the best performing model for a given SNR. We made thousands of training runs for each SNR and saved the best models for all SNRs but SNR= $0.5$ . The best performing model was defined as the model with the highest true positive rate (TPR) also having a false positive rate (FPR) below $0.1\%$ .

In our second investigation we applied each of these six best models to time traces with SNRs different from the one used for the model training. Our goal was to get the TPR and also FPR at the other seven SNR values. The colored curves in Fig. 3 denote the obtained TPR for all six models as a function of SNR.

Generally, models trained for $1<$ SNR $<4$ have good results, while the two remaining models have significantly lower TPRs. The best result for the classification was obtained with the model trained for SNR= $1.5$ . The TPR values obtained with this model surpassed all other models for SNR $<$ $4$ . This model reached 45%, 68% and 84% for SNR= $1.5$ , $2$ and $2.5$ , respectively. An important conclusion is that the FPR stayed below $0.2\%$ for all six models even for SNRs on which these models were not trained.

4 Signal recovery

Once a radio trace with an air shower signal has been recorded, the task is to extract the signal from a noisy trace. The recovery of radio signals is challenging, as they appear not to be very different from the ambient noise. We exploit the capabilities of deep learning techniques to differentiate between fine details in the properties and characteristics of signal and background traces.

4.1 Strategy

Our signal recovery aims at the reconstruction of air shower signals, i.e., the complete time traces with $1,000$ time bins (cf. Fig 1, top). The air shower signal is fully contained in the time trace in this case. For this purpose we developed a network for regression, and used supervised training to reconstruct the entire signal from a noise-contaminated measurement. To train the network, simulated signal traces serve as labels, and the mean square error metric is utilized to calculate the loss function.

For a successful reconstruction it is crucial that the original signals are reconstructed as precisely as possible while suppressing noise. To asses that the traces were efficiently cleaned and the signals conserved, both the SNR (cf. Eq. 2.1) and the signal energy contained in the traces

[TABLE]

are determined. The signal energy is calculated from the voltage trace $U(t)$ in units of electron volts (eV) using the elementary charge $e$ , the antenna impedance $R=50\,\Omega$ , and sampling rate $\Delta t=5.5$ ns. The noise contribution contained in the time interval $[t_{3},t_{4}]$ is subtracted from the total energy in the signal window $[t_{1},t_{2}]$ to recover only the signal energy.

4.2 Network concept and training

For implementation of the networks we once again used TensorFlow [48] with the Keras interface [49].

The de-noising autoencoder network utilizes convolutional layers searching for translational invariant patterns in the one-dimensional time series. The architecture features a bottleneck-like structure illustrated in Fig. 4 (left). This structure consists of two parts: in the first stage, the trace is encoded (encoder) by decreasing the temporal dimension of the traces to multiple compressed representations. These compressed traces represent the input trace in terms of different patterns. In the second stage, the representations are decoded (decoder) and combined in order to recover the original dimensionality. In the whole procedure, the network searches for relevant features within the traces and subsequently chooses and unfolds only features associated with the signal.

This bottleneck design is utilized in other networks such as the so-called U-Net [60] or unsupervised autoencoders. As an example, this concept has been studied for its applications in gravitational wave analysis [53]. An advantage of the bottleneck feature is that with comparable performance, the number of free parameters (weights, biases) is reduced relative to other network designs.

Architecture

Within the network, the temporal dimension is initially decreased from $1,000$ to $25$ bins and afterwards restored. A detailed description of the network’s layout is presented in Fig. 4 (right). To decrease the temporal dimensionality, striding of $2$ or $5$ time bins is used within the convolutional layer. All layers have a fixed kernel size of $5$ bins.

First, a single layer with four filters takes the input data without changing its dimensionality. The reduction is then performed in four blocks, each consisting of two layers according to the reduction sequence of $1000\rightarrow 500\rightarrow 250\rightarrow 125\rightarrow 25$ time bins. In each block, the first layer decreases the dimension while the second preserves the size. Both layers use the same number of filters, which doubles for each block from $16$ to $128$ . In the same way, the decoder unfolds the trace again. The encoding layers utilize 2D convolutions, while the decoding layers perform transposed convolutions [49].

Additionally, three shortcuts are realized between several layers which superimpose the features represented by the layers (cf. Fig. 4 right). With these connections, gradient back-propagation is supported, resulting in better training stability as discussed in so-called Residual Networks [59]. In our network, the shortcuts were essential for successful training.

As we aim to reconstruct oscillating time series, the activation function needs to cope with positive and negative values. The parametric rectified linear unit PReLU [51] enables negative values, and has been used instead of the widely used ReLU activation. We used $128$ as the batch size.

Tests regarding the architecture with increased complexity, i.e., increased number of filters or layers, revealed no improvement in performance. As an alternative architecture, we also investigated a convolutional network keeping the temporal dimension of the traces within the network. No major difference in the performance of the signal recovery was observed. Owing to fewer parameters, the presented bottleneck network required ca. $30$ sec per epoch on the NVIDIA 1080 GTX card, which turned out to be $30-40\%$ faster than the convolutional network. Thus we report the results of the more computationally efficient bottleneck network in this paper.

Training

Here we also use the Adam optimizer [52] for our supervised network training, with a learning rate of 0.001. We present to the network simulated traces of air shower signals superimposed on quasi-realistic noise traces as the input and give the pure signal traces as labels, which should be recovered by the network from the input traces.

The quality of the training is described by the loss functions (here the mean square error metric) for the training and validation data sets after each epoch. The corresponding curves, presented in Fig. 5, initially show a fast decrease, indicating effective learning. The training loss then further decreases (note the logarithmic scale), indicating sufficient complexity of the network for learning, while the validation loss stagnates without signatures of over-training. The training is stopped after 26 epochs using the Keras callback "EarlyStopping" [49], as no further improvement in terms of validation loss is achieved.

4.3 Energy of the air shower signal

The trained network can be used to recover signals from traces contaminated with noise. Two examples of signal recovery with the de-noising autoencoder are presented in Fig. 6. The upper two figures show the pure radio signal (yellow traces) and its superposition with noise (blue traces).

The lower two figures show the corresponding reconstructed signal traces (red) on the output of the network. In both examples, the network correctly identifies the signal and reconstructs a signal trace with no significant noise contribution. As a result of the signal reconstruction, the SNR increases significantly for the reconstructed traces (cf. the upper left boxes in the figures show the SNR values and the signal energy in units of eV). The reconstructed signals have the proper shape and amplitude. The deviation in the signal energies improves compared to the input traces. For example, the signal energy decreases from 1.45 keV to 0.39 keV while the true (label) signal has 0.49 keV for the event shown on the left in Fig. 6.

The network performance is also evaluated on the entire test data set. First, the signal recovery is examined. To assess whether the signal contributions were properly recovered, the deviation in energy is calculated as

[TABLE]

where $E_{\textrm{true}}$ is the energy obtained from the label traces according to Eq. 4.1. $E_{\textrm{i}}$ donates the signal energy obtained from a trace recovered with the network (i = rec) or taken directly from the input traces (i = input).

The distribution of the energy deviation of the recovered traces is presented as a dashed histogram in Fig. 7. It has a prominent peak at the median, close to zero, and a width $\sigma\simeq 20\%$ . The additional peak around $-1$ indicates that $\sim 20\%$ of the signals were not recovered. For these traces, the network reconstructed either a noise pulse or no significant signal at all. All of these traces have low SNR and hence, low signal. This is verified by the red histogram in Fig. 7, which shows the deviation in energy only for events with $\mbox{SNR}>5$ .

The latter distribution indicates that the ability of the network to properly recover the signal depends on the SNR. The fraction of events recovered by the network with $|\Delta E\textrm{rec}|/E_{\textrm{true}}<0.5$ as a function of the minimum SNR ${}_{\textrm{input}}$ is shown as a yellow curve in Fig. 8 (left). We can see that 97 $\,\%$ and almost all events are reconstructed with $\mbox{SNR}=5$ and $8$ , respectively. The blue curve shows the fraction of events with $|\Delta E_{\textrm{input}}|/E_{\textrm{true}}<0.5$ when calculating the signal energy from input traces using Eq. 4.1. This equation compensates for a certain noise contribution by subtracting the energy deposit in a signal window with the content from a noise window. The same strategy is used for the data measured by the AERA array [34]. Comparing the two curves in Fig. 8 reveals that more low SNR events are reconstructed using the autoencoder network.

We determine the Full Width at Half Maximum (FWHM) from the histogram of reconstructed signals $\Delta E/E_{\textrm{true}}$ within a given interval of SNR. The FWHM $/2.35$ serves as a measure of the width and the FWHM center as the mean of the distributions and are shown as a function of the SNR in Fig. 8 (right). In order to have a robust and reliable estimation of the energy resolution and bias at low SNR, we choose this metric over a normal distribution. We get comparable results with the FWHM and normal distribution for high SNR.

As before, the yellow markers identify the resolution (circles) and bias (triangles) of signals recovered by the network while the blue markers are determined by using Eq. 4.1 on the input traces.

The network improves the resolution of the signal energy for low SNR. The resolution for events with the SNR between 5 and 6.5 increases from 26% to 19%. This improvement slows down until the resolution determined directly from the input traces is better at $\text{SNR}\sim 10$ , where the resolution is below 15%. Let us note that the designed deep learning algorithm can be adjusted, e.g. by the choice of the normalization, to increase resolution for low-SNR events while only slightly worsening results for high-SNR events. This study is dedicated to improving the reconstruction for low-SNR events as for high-SNR events the trace cleaning becomes unnecessary.

The FWHM center reveals a bias in the signal energy reconstruction for both approaches, see Fig. 8 (right). The reconstruction of the energy for signals recovered by the network shows a bias of $\sim 10\%$ for low SNR and this bias vanishes for $\text{SNR}>10$ . The bias at low SNR will contribute to the total systematic uncertainty, which will be still smaller than the total systematic uncertainty of 28% reported for the square of the reconstructed electric-field amplitudes for AERA [34].

We can conclude that the autoencoder network improves the recovery of cosmic ray radio pulses in a single-polarization measurement, particularly in the case of weak signals. A direct comparison of the network with standard techniques applied to real measured data, including electric-field traces of at least two polarization channels, would be beneficial and could lead to improvements of the network.

4.4 Frequency spectrum of the air shower signal

We also compared the frequency distribution of the reconstructed and label time traces. The frequency spectrum of an event is shown in Fig. 9 (left). The total signal and air shower signal, or the label, are shown as blue and yellow curves, respectively. The air shower spectrum exhibits a falling distribution in the filtered interval between $30$ and $80$ MHz. Noise contributions contained in the blue curve show large fluctuations.

The frequency spectrum of the signal reconstructed by the network (red) is shown in the lower left part of Fig. 9 together with the label spectrum (yellow). In order to quantify the accuracy of the reconstruction in the frequency domain, we calculate the integral over the difference between reconstructed (input) and true signal spectrum normalized by the true signal spectrum

[TABLE]

where $\mathcal{F}$ is the absolute value of the complex spectrum. We have $\Delta I/I_{\textrm{true}}=0.2$ for the presented example. The distribution of $\Delta I/I_{\textrm{true}}$ for all traces reconstructed with $|{E_{\textrm{rec}}-E_{\textrm{true}}}|/{E_{\textrm{true}}}<0.5$ (see Fig. 8 (right)) is shown in Fig. 9 (right). The distribution falls with a $68\%$ quantile of $\sigma_{\mathrm{{68}}}=0.15$ and a $95\%$ quantile of $\sigma_{\mathrm{{95}}}=0.31$ . This investigation in the frequency regime confirms that the signal recovery from noise contaminated time traces is feasible. Note that the frequency spectra of the signals were not used for the training of our network.

5 Conclusion

In this work, we investigated deep learning methods for classification and reconstruction of radio signals emitted by cosmic ray induced air showers.

The identification of radio signals within a noise contaminated environment had a true positive rate of about $90\,\%$ for signal-to-noise ratios SNR $\,>\,3$ , while the false positive rate was below $0.2\,\%$ . This level of SNR is typically used as the minimum value for the reconstruction of measured radio signals. When including contributions from radio frequency interference (RFI) this picture does not change. The best model found was trained for signal amplitudes close to the noise level, i.e. SNR= $1.5$ , that may be targeted by future experiments.

A rather accurate reconstruction of the identified radio signal can be achieved by a de-noising autoencoder; a deep neural network model adopting a bottleneck-like structure. The algorithm combines an encoding and unfolding of time traces, providing efficient noise suppression. About $80\,\%$ of signals were recovered including heavily noise contaminated traces with low signal-to-noise ratios. This number increases to above 97 % for events with a signal-to-noise ratio beyond 5. From the de-noised traces the signal energy is calculated with a resolution of $\sim\,20\,\%$ for events with a signal-to-noise ratio between 5 and 6.5. This resolution improves with increasing signal-to-noise ratio. To benchmark our network we compared the fraction and resolution determined from de-noised traces with results obtained for the noise contaminated traces. This comparison reveals better results using the network for events with low signal-to-noise ratios.

A benchmark comparison in frequency-space revealed that RFI signals were effectively suppressed and the spectral shape of the signal was recovered well.

We presented two neural network models which successfully identify air shower signals and reconstruct the signal energy. These networks can be used in any current or future observatory [42, 61]. The authors foresee additional improvements in both analyses by using the signal measured by two channels in an antenna, or by exploring the information in the frequency spectrum of measured traces.

To further optimize the models, data collected at the site of an experiment could be used as an alternative to Monte Carlo-generated data. Training can improve classification and signal reconstruction by using measured noise signals. The simulation studies presented in this work show the potential of applying deep learning methods to the radio detection of air showers.

Acknowledgments

It is our pleasure to acknowledge the interaction and collaboration with many colleagues from the RWTH Aachen University and the Pierre Auger Collaboration. We are grateful to C. Glaser for sharing his simulated data sets and to J. Glombitza for his valuable comments on deep learning techniques. We thank an anonymous reviewer for her/his useful comments and M. Malacari for reading the manuscript. We acknowledge the financial support of the Ministry of Innovation, Science and Research of the State of North Rhine-Westphalia, and the Federal Ministry of Education and Research (BMBF).

Bibliography61

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G.E. Hinton, S. Osindero and Y.W Teh, A fast learning algorithm for deep belief nets, Neural Computation 18 (7), 1527–1554 (2006) .
2[2] P. Vincent, H. Larochelle, Y. Bengio and P.A. Manzagol, Extracting and Composing Robust Features with Denoising Autoencoders, Proc. 25th Int. Conf. on Machine Learning, Helsinki, Finland, 1096 (2008) .
3[3] D.C. Ciresan, U. Meier and J. Schmidhuber, Multi-column deep neural networks for image classification, ar Xiv:1202.2745 (2012) .
4[4] O. Russakovsky et al., Imagenet large scale visual recognition challenge, International Journal of Computer Vision 115 (3), 211–252 (2015). ar Viv/1409.0575 .
5[5] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, ar Xiv/1512.03385 (2015) .
6[6] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press , Cambridge, MA, US (2016). deeplearningbook.org .
7[7] A. Aurisano et al., A Convolutional Neural Network Neutrino Event Classifier, JINST 11 (09), P 09001 (2016) .
8[8] P. Baldi et al., Searching for exotic particles in high-energy physics with deep learning, Nature Communications 5 , 4308 (2014) .