Fingerprint matching of beyond-WIMP dark matter: neural network approach
Kyu Jung Bae, Ryusuke Jinno, Ayuki Kamada, Keisuke Yanagi

TL;DR
This paper introduces a neural network approach to analyze and compare beyond-WIMP dark matter models by capturing complex suppression patterns in galactic structure formation.
Contribution
It proposes using neural networks to effectively characterize and communicate the suppression features of various beyond-WIMP dark matter models.
Findings
Neural networks can model complex suppression shapes in matter power spectra.
The approach facilitates comparison across different beyond-WIMP models.
Demonstrated on a simplified light feebly interacting massive particles model.
Abstract
Galactic-scale structure is of particular interest since it provides important clues to dark matter properties and its observation is improving. Weakly interacting massive particles (WIMPs) behave as cold dark matter on galactic scales, while beyond-WIMP candidates suppress galactic-scale structure formation. Suppression in the linear matter power spectrum has been conventionally characterized by a single parameter, the thermal warm dark matter mass. On the other hand, the shape of suppression depends on the underlying mechanism. It is necessary to introduce multiple parameters to cover a wide range of beyond-WIMP models. Once multiple parameters are introduced, it becomes harder to share results from one side to the other. In this work, we propose adopting neural network technique to facilitate the communication between the two sides. To demonstrate how to work out in a concrete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
CTPU-PTC-19-18
DESY 19-109
UT-19-13
**Fingerprint matching of beyond-WIMP dark matter: **
**neural network approach **
Kyu Jung Baea, Ryusuke Jinnoa,b, Ayuki Kamadaa, and Keisuke Yanagic
[TABLE]
Galactic-scale structure is of particular interest since it provides important clues to dark matter properties and its observation is improving. Weakly interacting massive particles (WIMPs) behave as cold dark matter on galactic scales, while beyond-WIMP candidates suppress galactic-scale structure formation. Suppression in the linear matter power spectrum has been conventionally characterized by a single parameter, the thermal warm dark matter mass. On the other hand, the shape of suppression depends on the underlying mechanism. It is necessary to introduce multiple parameters to cover a wide range of beyond-WIMP models. Once multiple parameters are introduced, it becomes harder to share results from one side to the other. In this work, we propose adopting neural network technique to facilitate the communication between the two sides. To demonstrate how to work out in a concrete manner, we consider a simplified model of light feebly interacting massive particles.
Contents
1 Introduction
Dark matter (DM) is an essential component for the Universe to form the current shape. Its existence and abundance are probed by gravitational observations such as galaxy rotation curves, bullet cluster collision, and cosmic microwave background (CMB) anisotropy. On the other hand, we have not seen any DM signal by any non-gravitational interactions, and thus we still do not know the identity of DM: what it is and how it is produced. One intriguing possibility is that DM consists of a new particle, which provides a clue to physics beyond the standard model (SM) (see Ref. [1] for a review).
One of the early attempts is a weakly interacting massive particle (WIMP) (see Refs. [2, 3] for recent reviews). In this direction, much efforts have been devoted at the large hadron collider (LHC) (for example, mono-jet searches [4, 5]) and at direct/indirect detection searches [6, 7, 8, 9]. However, no firm signals have been reported yet. It may motivate us to consider beyond-WIMP scenarios that can be probed by cosmological/astrophysical observations.♢♢\diamondsuit1♢♢\diamondsuit11 We refer readers to Ref. [10] for a recent review of gravitational probes of DM properties.
WIMPs behave as cold dark matter (CDM) on galactic scales. They are in good agreement with many independent observations such as CMB anisotropy [11] and galaxy clustering [12]. On the other hand, their predictions of galactic-scale structure are in debate. On galactic scales, there have been issues that are difficult to explain in CDM (small-scale issues).♢♢\diamondsuit2♢♢\diamondsuit22 Prominent examples are the missing satellite problem [13, 14, 15, 16, 17], core-cusp problem [18, 19, 20, 21], and too-big-to-fail problem [22, 23, 24, 25, 26, 27]. We refer readers to Ref. [28] for a recent review and further details. State-of-the-art hydrodynamical simulations have been demonstrating that astrophysical processes also play an important role [29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46]. There have also been reports that small-scale issues persist even in state-of-the-art hydrodynamical simulations [47, 48, 49, 50, 51, 52, 53, 54, 55]. To our best knowledge, it is still controversial if astrophysical processes fully resolve the small-scale issues.
Alternatives to CDM may explain small-scale issues: warm dark matter (WDM) [56, 57, 58, 15, 59, 16, 60, 17, 52]; fuzzy DM [61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71]; and long-lasting DM interaction with primordial plasma or free-streaming light particles [72, 73, 74, 75, 76, 60, 77, 78, 79, 80, 81, 82, 83, 84, 85].
On the other hand, impacts on galactic-scale structure formation depend on beyond-WIMP scenarios. Free-streaming of light WDM particles smears out the primordial density contrast. Quantum pressure of fuzzy DM prevents DM from gravitational clustering. Pressure of radiation to which DM couples involves DM in acoustic oscillation rather than gravitational clustering. Such effects are reflected in the linear matter power spectrum, which one can obtain by following evolution of the primordial density contrast. Generally by performing a suit of simulations with the resulting linear matter power spectrum, one can obtain observable quantities, which can be directly compared with cosmological/astrophysical observations. In summary, we need to work out the following procedure on a model-by-model basis:
- Model Linear matter power spectrum Observables.
See the blue flow in Fig. 1. The whole procedure requires interdisciplinary expertise from particle phenomenology to (computational) astrophysics. Moreover, each step often requires a dedicated calculation. In particular, simulations in the last step are often too time-consuming to repeat.
One can work out each step independently by parametrizing the linear matter power spectrum. See the red flow in Fig. 1. A single parameter has been adopted conventionally: the thermal WDM mass .♢♢\diamondsuit3♢♢\diamondsuit33 An underlying model may be light gravitino [86, 87]. WDM particles are thermalized in the early Universe and decouple from thermal plasma at some point.
On the other hand, a single parameter is not enough to cover a wide range of beyond-WIMP scenarios. For this purpose, Ref. [88] introduces the 3-parameter () characterization of the linear matter power spectrum. On one side, one (likely particle physicist) can construct a map of model parameters onto . On the other side, one (likely astrophysicist) can provide observational constraints on , as indeed done for the Lyman- forest data in Ref. [89]. By combining results from the two sides, one can obtain observational constraints on a given beyond-WIMP scenario. Nevertheless, once multiple parameters are introduced, it becomes hard to share results from one side to the other.
In this respect, we propose building ready-to-use networks: one maps model parameters onto ; and another maps onto observables. One can use these networks to examine models without repeating the aforementioned time-consuming procedure. Ideally, it would be the most efficient if one obtained analytic maps, but in reality, it is hard to establish such analytic maps. Thus, a numerical method is helpful to develop such effective maps. For this purpose, we adopt neural network technique.
To be concrete, in this paper, we consider a feebly interacting massive particle (FIMP) [90] (see Ref. [91] for a recent review). Light (keV-scale) FIMPs, which are produced through the freeze-in mechanism, are a compelling example of WDM. Even in FIMP models, the shape of suppression in the linear matter power spectrum depends on production processes such as 2-body decay, 3-body decay, and 2-to-2 scattering [92, 93, 94, 95, 96, 97, 98] (see Ref. [99] for a comprehensive discussion).♢♢\diamondsuit4♢♢\diamondsuit44 We refer readers to Refs. [100, 101, 102, 103] for sterile neutrino DM. Sterile neutrinos are produced through mixing with active neutrinos. We also refer readers to Ref. [104] for superWIMPs. SuperWIMPs are produced by the decay of WIMPs long after the WIMP freeze-out. If the WIMP decay occurs close after the WIMP freeze-out, one may need to take into account the momentum distribution function of WIMPs [105, 106, 107, 108, 109]. In this paper, we do not consider these possibilities, although they may be FIMPs in a broad sense.
Thus 3-parameter characterization rather than conventional single-parameter characterization is required to cover a wide range of FIMP models. By taking a simplified FIMP model, we demonstrate how one can work out the simplified procedure. We also provide the obtained neural networks through the arXiv website: one is a map of “model parameters ” and the others are “ observables”.
The organization of this paper is following. In Sec. 2, we overview the conventional procedure to place constraints on FIMPs and describe the simplified procedure with the parametrization. In Sec. 3, we introduce a simplified FIMP model. Our FIMP model shares many common aspects with a broad class of FIMP models. The basic production process is -body decay. We take into account late-time entropy production after freeze-in (case A) and also freeze-in production through -to- scattering (case B). In Sec. 4, we introduce neural network technique and work out the simplified procedure. We compare the constraints from the simplified procedure and those from the conventional procedure. Sec. 5 is devoted to the summary. In Appendix A, we compare our constraints to those obtained through an analytic map from the conventional thermal WDM mass. In Appendix B, we examine precision of the neural networks in detail. In Appendix C, we explain how to use the neural networks we provide.
2 Procedure for FIMP DM as an example
As we described in introduction, to study galactic-scale structure formation of beyond-WIMP scenarios, generically one has to take a 2-step procedure on a model-by-model basis:
- Model Linear matter power spectrum Observables
(corresponding to the blue flow in Fig. 1). In the case of FIMP, the first step of “Model Linear matter power spectrum” actually consists of two steps:
- Model DM phase space distribution Linear matter power spectrum.
To follow the two steps, one first needs to construct the collision term of the Boltzmann equation and integrate it to obtain the phase space distribution of the DM species. Then one has to follow evolution of the primordial density contrast with the obtained phase space distribution, possibly by using public cosmological Boltzmann solvers such as CLASS [110, 111]. In the following we overview this conventional procedure more specifically.
2.1 Model DM phase space distribution Linear matter power spectrum
We define the DM phase space distribution as a function of the cosmic time and the physical momentum , such that the DM number density is given by , where is the spin degrees of freedom. We assume that the DM phase space distribution is much smaller than unity. We then obtain the phase space distribution at a late cosmic time by integrating the collision term as
[TABLE]
where is the reheating time and is the cosmic scale factor. Given a squared matrix element of a specific production process, one obtains a semi-analytic expression of the corresponding collision term (see Ref. [99] for expressions).
FIMP production is most efficient when the heaviest particle in the process becomes non-relativistic (freeze-in mechanism). After that, FIMP particles free-stream and the phase space distribution is invariant as a function of the comoving momentum , where is the effective DM temperature (see Sec. 3.1 for a specific expression of ). Thus we use to characterize the distribution. Practically, we fit the obtained phase space distribution of DM by
[TABLE]
where are fitting parameters and runs for different production processes.♢♢\diamondsuit5♢♢\diamondsuit55 One may wonder if we can work out “Model DM phase space distribution” and “DM phase space distribution Linear matter power spectrum” separately by using . On one side, one can report constraints on . On the other side, one can calculate as a function of model parameters. It is worth investigating this possibility somewhere else.
We plug the fitting function into the public cosmological Boltzmann solver CLASS [110, 111] to obtain the linear matter power spectrum as a function of the wavenumber . We use the cosmological parameters from “Planck 2015 TT, TE, EE+lowP” in Ref. [112]. Practically, we use the CLASS fluid approximation of non-cold DM.
2.2 Linear matter power spectrum Observables
Galactic-scale structure places constraints on the linear matter power spectrum , or, the transfer function that is defined by
[TABLE]
It generically requires a suit of time-consuming simulations to obtain constraints on FIMP DM. We may simplify this step by using semi-analytic models and/or somehow converting the conventional thermal WDM mass .
In the conventional thermal WDM model, WDM particles follow the Fermi-Dirac distribution with two spin degrees of freedom with temperature . The relic abundance is expressed by and as
[TABLE]
For a given WDM mass, the temperature is determined such that the relic abundance reproduces the observed DM density. Note that for a keV-scale mass, somewhat large entropy production after decoupling is required for . On the other hand, FIMP DM has a different thermal history and thus different temperature and does not follow the Fermi-Dirac distribution. Thus reported lower bounds on is not directly applicable to FIMP DM.
In this paper we consider the number of satellite galaxies [113, 114, 115, 116, 117, 118] and Lyman- forest [119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132] as observables.♢♢\diamondsuit6♢♢\diamondsuit66 Other used probes include the delay of the reionization [133, 134, 135, 136, 137, 138, 139], the counts of high- gamma-ray bursts [140, 141], the faint end of luminosity function of high- galaxies [142, 135, 136, 143, 144, 145, 139, 146], the flux anomaly of quadrupole lens systems [147, 148, 149, 150, 151, 152, 153, 154], and the redshifted 21 cm signal [155, 156, 157, 158, 159, 160, 161, 162]. The counts of lensed distant supernovae [163] and direct collapse black holes [164] are suggested for a future use. We also refer readers to Ref. [165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179] for hydrodynamical simulation results differentiating WDM and CDM in galaxy formation.
Our analysis, which follows Refs. [180, 88], uses a semi-analytic model for the number of satellite galaxies and converts the reported lower bound on for the Lyman- forest.
Number of satellite galaxies
One compares the predicted number of satellite galaxies in simulated Milky Way-size or M31-size haloes with the observed one. If the predicted number is smaller than the observed one, such FIMPs are excluded. This constraint may be conservative when one counts all the subhalos above a certain mass, since some of subhalos may not host galaxies bright enough to be detected.
We evaluate the number of satellite galaxies from the linear matter power spectrum in our FIMP model as follows. Ref. [118] develops a semi-analytic formula of the subhalo mass function in the conventional thermal WDM model. The formula uses the conditional mass function [181] based on the extended Press-Schechter approach [182] and the halo model (see Ref. [183] for a review). The formula adopts the top-hat filter function in the Fourier space (sharp- filter) to reproduce results of -body simulations in the conventional thermal WDM model:
[TABLE]
where quantities with and without the subscript “0” denote those of the host halo and subhalo, respectively. For example, () is the subhalo (host halo) mass. The variance is given by the linear matter power spectrum as
[TABLE]
The filter scale is related with the mass as
[TABLE]
with the matter mass density at present . Following Ref. [118], we adopt and . We use as the Milky-Way mass, where is the dimensionless Hubble constant. With these values, the number of satellite galaxies above is , which is consistent with the result of the Aquarius simulation [184]. roughly corresponds to the lower bound on the maximal circular velocity of km/s.
We estimate the observed number of satellites above as (11 classical dwarf galaxies and ultra-faint dwarf galaxies).♢♢\diamondsuit7♢♢\diamondsuit77 Classical dwarfs: Sagittarius, LMC, SMC, Ursa Minor, Sculptor, Draco, Sextans, Carina, Fornax, LeoII, and LeoI. Ultra-faint dwarfs: Segue I, Ursa Major II, Segue II, Willman I, Coma Berenics, Bootes II, Bootes I, Pisces I, Ursa Major I, Hercules, Canes Venatici II, Leo IV, Leo V, Pisces II, Canes Venatici I. We refer readers to Refs. [185, 186] for dynamical properties. Note that , where is the circular velocity at the half light radius and is the line-of-sight velocity dispersion [186].
We multiply 3.5 by the number of ultra-faint satellites found in SDSS to take account of the SDSS limited sky coverage as in Refs. [114, 60, 115, 116, 118, 180, 88]. places a lower bound on the conventional thermal WDM mass as keV.♢♢\diamondsuit8♢♢\diamondsuit88 We remark that we do not use the fitting function given by Eq. (2.12), but directly compute the linear matter power spectrum by using CLASS [110, 111].
As we see, implicitly depends on the lower bound on the satellite mass. For example, once a number of smaller-size satellite galaxies are discovered in future, one has to repeat the above procedure by adjusting the lower bound on the satellite mass and scan model parameters on a model-by-model basis again. This drives us to use the parametrization. Once the observational constraint on is updated, one can easily update the constraint on models parameters by using a constructed map between model parameters and .
Lyman- forest
Another observable is the Lyman- forest in high-resolution quasar spectra. The flux power spectrum is a powerful probe of underlying galactic-scale structure, while the thermal history of the intergalactic medium has uncertainties. The most stringent constraint seems to exclude the WDM solution to small-scale issues [187].
The procedure for the Lyman- forest constraint is an example of mapping the reported lower bound on the conventional thermal WDM mass onto a given model. We evaluate the impact of the WDM model on the Lyman- forest data as follows. This approach follows Ref. [88], which extends the approach of Ref. [180]. First, given a 3-dimensional linear matter power spectrum , we calculate the 1-dimensional power spectrum as
[TABLE]
Second, we normalize the 1D power spectrum by that in the CDM model:
[TABLE]
Third, we integrate over the typical range of that a given Lyman- forest spectrum probes:
[TABLE]
The dimensionless deviation of represents net suppression in the Lyman- forest spectrum:
[TABLE]
Finally, we compare between our FIMP model and the conventional thermal WDM model with the reported lower bound on . Note that one should use the typical range of for and the lower bound on consistently from the same dataset or analysis. If , then we regard our FIMP model is excluded.
Ref. [88] suggests and for the MIKE/HIRES+XQ-100 combined dataset used in Ref. [126]. The dataset places the lower bound of keV in the conventional thermal WDM model. We find that for keV,♢♢\diamondsuit9♢♢\diamondsuit99 We again remark that we do not use the fitting function given by Eq. (2.12), but directly compute the linear matter power spectrum by using CLASS [110, 111]. This may be partially why our is different from in Ref. [88].
so we use as an upper bound of of a given model. As we see, needs a data-dependent input and and thus one has to repeat the procedure for different dataset. A more extendable procedure is presumable. Our proposal is the parameterization. For a given new dataset, while one has to update constraints in terms of in terms of , one can use the constructed map between model parameters and as it is.
2.3 parametrization of the transfer function
As we described above, the thermal WDM model has been conventionally used to report observation constraints on the transfer function . The single-parameter fitting function of in the thermal WDM model is given by [57, 188, 119]♢♢\diamondsuit10♢♢\diamondsuit1010 We refer readers to Ref. [102] for a fitting function of in the resonantly produced sterile neutrino DM.
[TABLE]
Here and thus only is a parameter related with the thermal WDM mass:
[TABLE]
from Ref. [119].
However, the single-parameter () characterization does not cover a wide range of beyond-WIMP models. Ref. [88] proposes characterizing the transfer function as
[TABLE]
This parametrization allow us to divide the procedure to place constraints on FIMPs into two with being a “common language”. On one side, one calculates as a function of model parameters in a given model (corresponding to the left red flow in Fig. 1). On the other side, one reports a likelihood function from observations as a function of (corresponding to the right red flow in Fig. 1). By combining these two, one can obtain the constraints on model parameters more easily. This procedure is also very extendable. Once a new observation date becomes available, what one has to do is just to update the latter, namely, constraints on . One does not need to repeat the former. One can use a constructed map between model parameters and as it is.
A remaining challenge is how to share results from the two sides. It is not apparent how to share 3-parameter results efficiently. In this paper, we propose using neural network technique. In the context of the paper, advantages of using a neural network are:
- –
It expresses nonlinear relations quite efficiently.
- –
It learns nonlinearity without being explicitly taught.
- –
It provides us with a unified format in presenting results.
We indeed see these advantages in Sec. 4.
3 Simplified FIMP model
In this work, we consider a simple setup. The model contains a seemingly renormalizable interaction of Majorana DM with a heavy Dirac fermion and a heavy scalar :
[TABLE]
with the Yukawa coupling . We assume the mass hierarchy of .♢♢\diamondsuit11♢♢\diamondsuit1111 The result will change only slightly for and for different quantum statistics of particles [99].
This simplified model virtually corresponds to a light axino FIMP model considered in Refs. [189, 99]. The axino FIMP model is based on a supersymmetric version of Dine-Fischler-Srednicki-Zhitnitsky axion model [190, 191]. Axino is a fermionic supersymmetric partner of axion that dynamically explains why the strong interaction preserves very precizely [192, 193, 194, 195]. One can identify , , and as light axino, Higgsino (supersymmetric partner of Higgs), and Higgs in the axino FIMP model.
3.1 Freeze-in production
We assume that is equilibrated in thermal plasma. Freeze-in production of DM proceeds mainly through -body decay of . The production process ceases (decouples) when the plasma temperature gets comparable with the mother particle mass; i.e., the decoupling temperature is .
It is convenient to define a DM “temperature” as
[TABLE]
with the effective number of massless degrees of freedom and the decoupling temperature . This temperature scales as with the cosmic scale factor and thus the dimensionless momentum is conserved after the decoupling. In the following, we take (all the SM particles) as a baseline value.
Case A: Decay with entropy production
Meanwhile, we incorporate a different value of or entropy production after the decoupling, by introducing as
[TABLE]
takes account of entropy production after the decoupling, or lager degrees of freedom at the decoupling (e.g., minimal supersymmetric standard model, where ). is applied to the case of late decoupling, i.e., .
We take into account only relevant model parameters to “warmness” of FIMP DM. Note that warmness of FIMP DM depends on the phase space distribution (equivalently, and ) and the FIMP mass . The phase space distribution does not depend on an absolute scale of and , but is sensitive to the ratio since the ratio determines the kinematic phase space of decay product, i.e., in this case. If the two masses are degenerate, the energy of in is suppressed and thus the resultant ’s are colder [98, 189, 99].
In this class of models, therefore, the relevant parameters are
[TABLE]
Hereafter we use the notation of , , and for the sake of notational simplicity. The Yukawa coupling is fixed by the observed DM abundance . While the colder phase space distribution is realized for a more degenerate mass spectrum, the larger Yukawa coupling or lighter is necessary to obtain the observed DM abundance.
Case B: Decay with scattering
Generally a daughter particle has another interaction with a light Dirac fermion :
[TABLE]
with the Yukawa coupling . One can identify as top quark (again as Higgs) in the axino FIMP model [189, 99]. We assume the mass hierarchy of . We also assume that is equilibrated in thermal plasma. In this case, freeze-in production of occurs through -channel scattering of and -channel scattering of as well as through 2-body decay of . The decoupling temperature is again . determines the scattering contribution to the yield, . Freeze-in production through scattering becomes more important for more degenerate and , since the partial decay width becomes smaller.
In summary, in this case, the relevant parameters are
[TABLE]
Again hereafter we use the notation of , , and for the sake of notational simplicity. is fixed by the observed DM abundance: . In this case, we do not vary but take several values such as , and .
3.2 Constraints
We derive constraints from and from through the conventional procedure described in Sec. 2 (corresponding to the blue flow in Fig. 1).
First we present constrains from in Fig. 2. The top-left panel is for Case A (Decay with entropy production), while the other panels are for Case B (Decay with scattering). For Case A, bluer regions satisfy the condition for each value of . For Case B, the three panels correspond to (top-right), (bottom-left), and (bottom-right), respectively. As in Case A, bluer regions satisfy for each value of . We also display two lines corresponding to (red-dashed) and (red-dotted), to depict a perturbative Unitarity limit.
Next we show constraints from in Fig. 3. The four panels are for Case A (top-left) and for Case B with (top-right), (bottom-left), and (bottom-right), respectively. For each parameter, bluer regions satisfy the condition . The red lines are the same as Fig. 2. We see that gives stronger constraints than .
As repeatedly stated, constraints on the transfer function are often provided in terms of the conventional thermal WDM mass . In Appendix A we convert keV corresponding to and keV corresponding to into constraints on our FIMP parameters. We see that the constraints are qualitatively similar but quantitatively slightly different ( in ) from those derived in this section.
4 Neural network approach
As stressed in Sec. 1, one of the main purposes of this paper is to provide ready-to-use maps for “Model parameters ” and also for “ Observables” (see the red flow in Fig. 1). Our proposal is to use a neural network for this purpose. In the following we first explain our neural network setup in Sec. 4.1, and then construct concrete neural networks for “Model parameters ” and “ Observables” in Sec. 4.2 and Sec. 4.3, respectively. Finally we combine the two neural networks to reproduce the constraints presented in Sec. 3 to demonstrate the precision of the neural networks.
4.1 Neural network setup
The setup of our neural network is summarized in Fig. 4. We identify the input vector as the three model parameters for each of Case A and B. As the layer proceeds, the original layer is operated by linear algebra and then multiplied by a nonlinear function . More concretely, the connections among the layers are given by
[TABLE]
where is the number of hidden layers and ’s and ’s are called weight matrices and biases, respectively. The nonlinear function is understood as acting on each component:
[TABLE]
and we adopt a Rectified Linear Unit (ReLU) [197] for the function:
[TABLE]
We train the neural network with supervised learning. As we explain in the next subsections, we collect combinations of the input and the true value (from direct calculations) of the output . Note that, with such a large number of data points, it is much more efficient to recast the obtained data onto the neural network and share the neural network parameters than to provide the data itself. Training of the neural network is performed through the updates of the weight matrices and biases so that the output of the neural network gets closer to the true value . The closeness is measured by the loss function , which we take as
[TABLE]
where denotes the -th component of .
For the number of hidden layers, we use in this paper. Then the relation between the input and output reduces to
[TABLE]
We construct the neural network using the public code TensorFlow [198],♢♢\diamondsuit12♢♢\diamondsuit1212 We use the version r1.1.7.
and train it for epochs. The whole dataset is split into training () and test () subsets, and the former is used to train the neural network, while the latter is used to monitor the training process and avoid possible overfitting. We also apply a 10% dropout [199] to avoid overfitting. We use Adam Optimizer [200] with a learning rate of 0.001.
4.2 Model parameters
We first construct a neural network connecting model parameters and the transfer function parameters . Before moving on, however, we remark that parameter degeneracy often appears when we fit to the resulting power spectrum in the simplified FIMP model in Sec. 3. Indeed Ref. [88] also notices this parameter degeneracy (see Appendix. A of Ref. [88]). Meanwhile, Ref. [89] reports that the combination of is well constrained by observational data (see Fig. 4 of Ref. [89]), while the orthogonal direction is not very sensitive. Therefore, in this paper, we fix this orthogonal direction by the relation
[TABLE]
As a result, the output becomes a two-component vector.
Case A: Decay with entropy production
Let us first take Case A (see Sec. 3.1). We identify the output and input as
[TABLE]
Here and are the means of the input and output data, respectively, while and are the standard deviations. These are constant vectors introduced to normalize the data and make learning more efficient.
For the dataset, we sample about data points from , , and . We exclude data points in the gray-shaded regions of Figs. 5 and 6, and thus the resulting neural networks cannot be used for the input parameters in these regions.♢♢\diamondsuit13♢♢\diamondsuit1313 The reason for excluding the gray-shaded regions is as follows. For Case A, the right-top corner of the parameter space corresponds to the CDM limit. Since the transfer function approaches unity in this region, the parameter set are not uniquely determined by fitting even after is imposed. For Case B, the left-top corner corresponds to the large Yukawa coupling limit and thus the perturbative Unitarity violation problem arises.
Case B: Decay with scattering
Next let us take Case B (see Sec. 3.1). We identify the output and input as
[TABLE]
For the dataset, we sample about data points from , , and . We again exclude data points in the gray-shaded regions of Figs. 5 and 6.
4.3 Observables
We next construct a neural network that maps onto the observables, more specifically, and introduced in Sec. 2. We identify the input and output as
[TABLE]
[TABLE]
Note that we do not assume in contrast to the previous subsection, and thus is a three-component vector. This is to accommodate broader class of models than the models we adopt in this paper. Also note that is a one-component vector, which means that we construct neural networks for “” and for “” separately.
For the dataset, we sample about points from , , and .
4.4 Combined results
Before combining the two neural networks constructed in the previous subsections, we remark that we discuss details about the precision of the neural network in Appendix B. We provide the resultant neural network parameters through the arXiv website. See Appendix C for further explanation of the data files. We also provide a Mathematica file (freeze-in.nb) for illustration.
Now let us check the precision of the neural network by combining the two neural networks. The results should coincide with the constraints obtained in Sec. 3 as long as the neural networks work well. Figs. 5 and 6 are the constraints from and derived through the combination of the two neural networks and thus should be compared with Figs. 2 and 3, respectively. We see that the neural networks nicely reproduce the original constraints.
We again stress constructing nonlinear maps for “Model parameters Linear matter power spectrum” and for “Linear matter power spectrum Observables” separately is very useful and time-saving: given the common language of , those interested in particle physics models can provide as functions of model parameters, while those who reports observational constraints can update the constraints in terms of . Neural network technique provides us with a ready-to-use format for this procedure.
5 Summary
Galactic-scale structure formation of the Universe is of particular interest in DM research. Beyond-WIMP scenarios alter galactic-scale structure formation, while conventional WIMP DM behaves as CDM. Precise measurement of galactic-scale structure in near-future observations may hint beyond-WIMP scenarios. On the other hand, here is a practical bottleneck. Impacts of beyond-WIMP scenarios on galactic-scale structure vary model by model. In principle, one has to repeat the two-step procedure on a model-by-model basis:
- Model Linear matter power spectrum Observables,
which is sketched by the blue flow in Fig. 1. Each step requires different disciplines and dedicated computations. Following this procedure in the model-by-model basis is very time-consuming.
We may improve the situation by characterizing the transfer function (i.e., the linear matter power spectrum) with some parameter. One (likely particle physicist) calculates the transfer function parameter as a function of model parameters. Another reports observational constraints in terms of the transfer function parameter. Now we can get constraints on the model parameters very easily by combining the two results. Although a single-parameter characterization (the thermal WDM mass ) has been conventionally used, 3-parameter characterization is proposed to cover a wide range of beyond-WIMP scenarios. Our main stress is that neural network technique facilitates sharing results from one side to another by providing the results in a ready-to-use format.
We devoted this paper to demonstrating how we can actually work out with and a neural network. To be specific, we considered a simplified model of light (keV-scale) FIMP DM Freeze-in production from -body decay gives a main contribution to the relic abundance. We also took into account entropy production after the decoupling and freeze-in production from scattering. We constructed first a map between the FIMP model parameters and and next a map between and the observables, i.e., the number of satellite galaxies and Lyman- forest, by adopting neural network technique. We provided the constructed maps in a ready-to-use format through the arXiv website. Meanwhile, we performed the conventional procedure to derive the direct constraints on the FIMP model parameters. The constraints derived through and a neural network are in good agreement with those derived through the conventional procedure.
Although we focused on a simplified model of FIMP DM in this paper, it is worth performing a similar study in other FIMP models such as sterile neutrino DM and superWIMP DM and also in other alternatives to CDM such as Fuzzy DM and late kinetic decoupling of DM. Our suggestion will facilitate comparison between beyond-WIMP models and future updates of constraints on galactic-scale structure formation, e.g., from redshifted 21cm surveys.
Acknowledgments
The work of KJB, RJ, and AK was supported by IBS under the project code, IBS-R018-D1. The work of RJ was supported by Grants-in-Aid for JSPS Overseas Research Fellow (No. 201960698). The work of RJ was supported by the Deutsche Forschungsgemeinschaft under Germany’s Excellence Strategy – EXC 2121 ,,Quantum Universe“ – 390833306. The work of KY was supported by JSPS KAKENHI Grant Number JP18J10202.
Appendix A Comparison with an analytic map
In this appendix we derive constraints on our FIMP model parameters by converting the thermal WDM mass . Proposed ways of converting onto a given model are as follows:
- –
One compares the characteristic quantity such as the free-streaming length [201, 202] and Jeans length [203, 204, 205] between a given WDM model and the conventional thermal WDM model. If the free-streaming length in the given model is larger than that in the conventional thermal WDM model with an observational lower bound on , the given model is regarded as disfavored by the same observation. See Ref. [60] for comparison of the transfer function in different WDM models with the Jeans length fixed.
- –
One compares the transfer function below some critical wavenumber between a given model and the conventional thermal WDM model. If in the given model is smaller in amplitude than that in the conventional thermal WDM model with a n observational lower bound on , the given model is regarded as disfavored by the same observation. A suggested choice of the critical wavenumber is the half mode where [109].
In this appendix, we adopt a “warmness” quantity (equivalently, the Jeans length) calculated from a DM phase space distribution [60]:
[TABLE]
where is the 2nd moment of the DM phase space distribution and thus
[TABLE]
depends on the shape of the phase space distribution. For a given observational lower bound on , a WDM model is regarded as disfavored by the same observation, if . Using the definition of DM temperature given by Eq. (3.2), we obtain the constraint on a FIMP as
[TABLE]
Note that in the conventional thermal WDM model, WDM particles follows the Fermi-Dirac distribution, and thus .
In our simplified FIMP model, the phase space distribution can be expressed analytically [206], and thus is also analytically derivable. As a result, we can construct an analytic map between and the model parameters. The total is calculated from each production process as
[TABLE]
where each is calculated analytically as
[TABLE]
and each FIMP yield is also obtained as
[TABLE]
Here , prefactors count a number of particle spieces ( and ), and is a dimensionful constant whose expression is not relevant in this appendix.
In this way, we derive the constraints on our simplified FIMP model from through warmness. First, Fig. 8 shows constraints from the observed number of Milky Way satellites, . keV corresponds to . The left and right panels are for Case A (Decay with entropy production) and Case B (Decay with scattering) with and should be compared with the top-left and top-right panels of Fig. 2, respectively. We see the results are different with each other, while are qualitatively equivalent.
Next, Fig. 8 shows constraints from the Lyman- forest data, . keV corresponds to . The left and right panels are for Case A and Case B with and should be compared with the top-left and top-right panels of Fig. 3, respectively. We see the derived results are different from those from the direct modeling in Sec. 3.2, as for .♢♢\diamondsuit14♢♢\diamondsuit1414 We also derive the constraints through . The derived constraints are again different from those from the direct modeling.
Appendix B Further check: original data vs neural network
In this appendix, we take a closer look at the difference between the original data and neural network.
First we check validity of the parametrization itself (which is irrelevant to the precision of the neural network). Fig. 9 shows the transfer function for Case A with , keV, and (left panel) and Case B with , , keV, and (right panel). The red points are data points, while the blue lines are given by Eq. (2.14) with the fitted values of ( as explained in Sec. 4.2). We see that the parametrization nicely reproduces the original data.
Next we examine the precision of the neural network. Fig. 11 compares the original values of and (upper panels) and the fit from the neural network (lower panels) for Case A with . Similarly, Fig. 11 is for Case B with keV. We see that the neural network not only reproduces the original data quite well, but also somewhat smoothens artificial fluctuations in the original data.
Figs. 12 and 13 are color plots for the relative error between the original data and neural network for (left columns) and (right columns), respectively. Fig. 12 is for Case A with , , and from top to bottom, while Fig. 13 is for Case B with , , and keV for from top to bottom. The relative error for Case A is at most 1 in and , while for Case B the error is at most and in and , respectively.
We finally comment that the error of the neural network for “ Observables” is much smaller than that for “Model parameters ”.
Appendix C How to use the neural network data
In this appendix we explain how to use the data provided through the arXiv website. The datafile we provide are
- •
mean.tsv, std.tsv,
- •
b1.tsv, b2.tsv, bout.tsv,
- •
w1.tsv, w2.tsv, wout.tsv.
The first items are the means, , and standard deviations, , which shift and normalize the neural network input and output. The second items are the biases, , , and , while the last items are the weight matrices, , , and .
The data files for “Model parameters ” are in the directory of freeze-in/CaseA for Case A, and in freeze-in/CaseB/Delta=... for Case B, respectively. The data files for “ Observables” are in freeze-in/NSat and freeze-in/deltaA for and , respectively.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. L. Feng, “Dark Matter Candidates from Particle Physics and Methods of Detection,” Ann. Rev. Astron. Astrophys. 48 (2010) 495–545 , ar Xiv:1003.0904 [astro-ph.CO] . · doi ↗
- 2[2] G. Arcadi, M. Dutra, P. Ghosh, M. Lindner, Y. Mambrini, M. Pierre, S. Profumo, and F. S. Queiroz, “The waning of the WIMP? A review of models, searches, and constraints,” Eur. Phys. J. C 78 no. 3, (2018) 203 , ar Xiv:1703.07364 [hep-ph] . · doi ↗
- 3[3] L. Roszkowski, E. M. Sessolo, and S. Trojanowski, “WIMP dark matter candidates and searches—current status and future prospects,” Rept. Prog. Phys. 81 no. 6, (2018) 066201 , ar Xiv:1707.06277 [hep-ph] . · doi ↗
- 4[4] ATLAS Collaboration, M. Aaboud et al. , “Search for dark matter and other new phenomena in events with an energetic jet and large missing transverse momentum using the ATLAS detector,” JHEP 01 (2018) 126 , ar Xiv:1711.03301 [hep-ex] . · doi ↗
- 5[5] CMS Collaboration, A. M. Sirunyan et al. , “Search for new physics in final states with an energetic jet or a hadronically decaying W 𝑊 W or Z 𝑍 Z boson and transverse momentum imbalance at s = 13 Te V 𝑠 13 Te V \sqrt{s}=13\text{ }\text{ }\mathrm{Te V} ,” Phys. Rev. D 97 no. 9, (2018) 092005 , ar Xiv:1712.02345 [hep-ex] . · doi ↗
- 6[6] XENON Collaboration, E. Aprile et al. , “Dark Matter Search Results from a One Ton-Year Exposure of XENON 1T,” Phys. Rev. Lett. 121 no. 11, (2018) 111302 , ar Xiv:1805.12562 [astro-ph.CO] . · doi ↗
- 7[7] Panda X-II Collaboration, X. Cui et al. , “Dark Matter Results From 54-Ton-Day Exposure of Panda X-II Experiment,” Phys. Rev. Lett. 119 no. 18, (2017) 181302 , ar Xiv:1708.06917 [astro-ph.CO] . · doi ↗
- 8[8] PICO Collaboration, C. Amole et al. , “Dark Matter Search Results from the Complete Exposure of the PICO-60 C 3 F 8 Bubble Chamber,” ar Xiv:1902.04031 [astro-ph.CO] .
