Road map for the tuning of hadronic interaction models with accelerator-based and astroparticle data

Johannes Albrecht; Julia Becker Tjus; Noah Behling; Ji\v{r}\'i Bla\v{z}ek; Marcus Bleicher; Julian Boelhauve; Lorenzo Cazon; Ruben Concei\c{c}\~ao; Hans Dembinski; Luca Dietrich; Jan Ebr; Jan Ellbracht; Ralph Engel; Anatoli Fedynitch; Max Fieg; Maria Garzelli; Chlo\'e Gaudu; Giacomo Graziani; Pascal Gutjahr; Andreas Haungs; Tim Huege; Karolin Hymon; Mirco H\"unnefeld; Karl-Heinz Kampert; Leonora Kardum; Lars Kolk; Natalia Korneeva; Kevin Kr\"oninger; Antonin Maire; Hiroaki Menjo; Leonel Morejon; Sergey Ostapchenko; Petja Paakkinen; Tanguy Pierog; Pavlo Plotko; Anton Prosekin; Lilly Pyras; Thomas P\"oschl; Julian Rautenberg; Maximilian Reininghaus; Wolfgang Rhode; Felix Riehn; Markus Roth; Alexander Sandrock; Ina Sarcevic; Michael Schmelling; G\"unter Sigl; Torbjorn Sj\"ostrand; Dennis Soldin; Michael Unger; Marius Utheim; Jakub V\'icha; Klaus Werner; Michael Windau; Valery Zhukov

arXiv:2508.21796·astro-ph.HE·December 18, 2025

Road map for the tuning of hadronic interaction models with accelerator-based and astroparticle data

Johannes Albrecht, Julia Becker Tjus, Noah Behling, Ji\v{r}\'i Bla\v{z}ek, Marcus Bleicher, Julian Boelhauve, Lorenzo Cazon, Ruben Concei\c{c}\~ao, Hans Dembinski, Luca Dietrich, Jan Ebr, Jan Ellbracht, Ralph Engel, Anatoli Fedynitch, Max Fieg, Maria Garzelli, Chlo\'e Gaudu

PDF

Open Access

TL;DR

This paper presents a comprehensive roadmap for tuning hadronic interaction models by integrating data from both accelerator-based experiments and astroparticle observations, enhancing the accuracy of event generators in high-energy physics.

Contribution

It introduces a novel unified tuning approach for hadronic interaction models using combined accelerator and astroparticle data sets.

Findings

01

Unified tuning framework for event generators.

02

Improved modeling of hadronic collisions at high energies.

03

Enhanced agreement with experimental data across datasets.

Abstract

In high-energy and astroparticle physics, event generators play an essential role, even in the simplest data analyses. As analysis techniques become more sophisticated, e.g. based on deep neural networks, their correct description of the observed event characteristics becomes even more important. Physical processes occurring in hadronic collisions are simulated within a Monte Carlo framework. A major challenge is the modeling of hadron dynamics at low momentum transfer, which includes the initial and final phases of every hadronic collision. QCD-inspired phenomenological models used for these phases cannot guarantee completeness or correctness over the full phase space. These models usually include parameters which must be tuned to suitable experimental data. Until now, event generators have been developed and tuned mainly on the basis of data from high-energy physics experiments at…

Tables5

Table 1. Table 1: Key characteristics of HEP accelerator-based experiments and astroparticle experiments. We use A A as a placeholder for a nucleus, s \sqrt{s} is the energy in the nucleon-nucleon center-of-mass system.

	Accelerator		Astroparticle
	Fixed-target	Collider	Cosmic rays	Neutrinos
Collision energy ( $\sqrt{s}$ )	up to 100 GeV	100 GeV to 14 TeV	up to 500 TeV	up to 10 TeV
Collision systems	e+e, e+p, p+p, p+ $A$ , $A$ + $A$		( $π$ ,K,p, $A$ ,e, $γ, μ, ν$ )+ $A$
Initial state	fixed		variable (energy and system)
Final state: rapidity	full	mid to forward	very forward
Final state: flavour	light	light and heavy	light	light and heavy
Resolution	single interaction		cascade	single interaction, cascade

Table 2. Table 2: Comparison of five event generators. Acronyms are defined in the text.

	EPOS4 [10]	EPOS LHC-R [11]	QGSJet III [12]	Sibyll 2.3d [13]	Pythia 8 [14, 15]
Primary domains	HIC, HEP	EAS, HIC	EAS	EAS	HEP
Theoretical basis	parton-based GRT, pQCD, energy sharing, saturation	parton-based GRT, pQCD, energy sharing	GRT, pQCD (DGLAP+HT)	GRT, pQCD (minijet)	MPI, pQCD, ISR, FSR
Nuclear collisions	idem	idem	idem	extended superposition	Glauber via Angantyr
Pomeron	semi-hard, dynamical saturation	semi-hard	semi-hard	soft+hard	soft+hard
Parton distributions	generated	custom (GRV for valence)	Pomeron PDFs + DGLAP + HT	GRV	various
Diffractive dissociation (low mass)	diffractive Pomeron	diffractive Pomeron	Good-Walker (3-channel eikonal)	Good-Walker (2-channel eikonal)	longitudinal strings
Diffractive dissociation (high mass)	Pomeron exchange	Pomeron exchange	cut-enhanced graphs	Pomeron exchange	MPI
String fragmentation	area law	area law	early Lund type	Lund	Lund
Forward-central correlation	strong	strong	strong	weak	strong
Charm production	pQCD	parameterised + intrinsic	—	parameterised + intrinsic	pQCD
Collective effects	core-corona, hydrodynamical flow, hadronic rescattering	core-corona, parameterised flow, hadronic rescattering	—	—	colour reconnection, rope fragm., string shoving, hadronic rescattering
${lim}_{s \to \infty} (σ_{π p} / σ_{p p})$	1	1	1	1	2/3
Programming language	C/C++, Fortran90, Fortran	Fortran	Fortran	Fortran	C++

Table 3. Table 3: Comparison of transport codes in the context of tuning. The CPU time estimates are order-of-magnitude and valid only for a fast hadronic interaction model like Sibyll , and without simulation of Cherenkov or radio emission.

	Monte-Carlo simulation	Cascade equation solver	3D Hybrid simulation
Examples	Corsika, CRPropa	Conex, MCEq	Corsika+Conex, Seneca
Observables	all	limited (see text)	all
Shower-to-shower fluctuations?	yes	no	yes
Cost of calculating observables for 10 EeV air shower	10k CPU hours for 1000 air showers with thinning	(1-100)k CPU hours for all energies (pre-computing tables)	2k CPU hours for 1000 air showers
Cost of calculating observables for 10 PeV air shower	1000 CPU hours for 1000 air showers	(1-100)k CPU hours for all energies (pre-computing tables)	200 CPU hours for 1000 air showers
Energy dependence of computational cost	$\propto E_{cosmic ray}$	$\approx$ constant	$\propto E_{cosmic ray}$

Table 4. Table 4: Observables in astroparticle experiments (described in the text) and their sensitivity to QCD features and properties of the cosmic rays [ 43 ] . Legend: σ inel \sigma_{\text{inel}} inelastic ( p , π ) (p,\pi) -air cross section, N mult N_{\text{mult}} average number of secondary hadrons, E max E \frac{E_{\text{max}}}{E} inelasticity (energy fraction carried by most energetic product), E π 0 E \frac{E_{\pi^{0}}}{E} energy fraction carried by neutral pions, σ D , B \sigma_{D,B} production cross section for D and B mesons, C 1 C_{1} mean free path to first collision, E E primary energy, ⟨ ln ⁡ A ⟩ \langle\ln\!A\rangle mean logarithmic mass, σ ( ln ⁡ A ) \sigma(\ln\!A) standard deviation of logarithmic mass.

Observable	Sensitivity to QCD feature					Sensitivity to cosmic ray feature
	$σ_{inel}$	$N_{mult}$	$\frac{E_{max}}{E}$	$\frac{E_{π^{0}}}{E}$	$σ_{D, B}$	$C_{1}$	$E$	$⟨ \ln A ⟩$	$σ (\ln A)$
$\int (d E / d x) d x$							✓
$⟨ X_{max} ⟩$	✓	✓	✓			✓	✓	✓
$σ (X_{max})$	✓		✓			✓	✓	✓	✓
$⟨ X_{μ, max} ⟩$	✓	✓	✓			✓	✓	✓
$σ (X_{μ, max})$	✓		✓			✓	✓	✓	✓
$⟨ N_{μ} ⟩$		✓		✓			✓	✓
$σ (N_{μ}) / N_{μ}$			✓	✓		✓	✓	✓	✓
$Φ_{(ν, μ), conv}$	✓	✓	✓	✓		✓	✓	✓
$Φ_{(ν, μ), prompt}$			✓		✓	✓	✓	✓

Table 5. Table 5: Rivet plugins and HEPData entries used in the forward tuning of Pythia, along with the ongoing Rivet -ization efforts toward a tune for air showers, as discussed in Subsection 5.2 . The third column specifies whether the analysis has an entry in the HEPData database, while the fourth column indicates whether the plugin is included in the public Rivet repository. The fifth and sixth columns define the collision system, where s \sqrt{s} represents the center-of-mass energy for proton-proton collisions, and s NN \sqrt{s_{\text{NN}}} denotes the energy in the nucleon-nucleon frame; the second half of the table focuses on fixed-target experiments, organized by beam momentum. The final column lists the extracted final states used for tuning.

Rivet plugin	Ref.	HEPData	Published	$\sqrt{s}$ , $\sqrt{s_{NN}}$ (TeV)	Collision system	Final state / Observable
LHCF_2015_I1351909	[80]	$✓$	$✓$	7	pp	neutral $d σ / d x_{F}$
LHCF_2016_I1385877	[81]	$✓$	$✓$	2.76, 7, 5.02	pp, pPb	neutral $d σ / d x_{F}$
LHCF_2018_I1692008	[86]	$✓$	$✓$	13	pp	neutral $d σ / d x_{F}$
LHCF_2018_I1518782	[79]	$✓$	$✓$	13	pp	neutral $d σ / d x_{F}$
LHCF_2023_I2658888	[87]	$✓$	$✓$	13	pp	neutral $d σ / d x_{F}$
LHCB_2013_I1251899	[88]	$✓$	$✓$	5	pPb	$d σ / d p_{T} d y$
LHCB_2016_I1504058	[89]	$✓$	$✓$	7, 13	pp	$d σ / d η$
LHCB_2019_I1720413	[90]	$✓$	$✓$	8.16	pPb	$d σ / d p_{T} d y$
LHCB_2021_I1889335	[91]	$✓$	$✓$	13	pp	charged $d σ / d p_{T} d η$
LHCB_2021_I1913240	[92]	$✓$	$✓$	5	pp, pPb	charged $d σ / d p_{T} d η$
LHCB_2022_I2694685	[93]	$✓$	$✓$	8.16	pPb	$d σ / d p_{T} d y$
				$p_{beam}$ (GeV/c)
E104_1976_I98502	[102]	$✓$		[23 - 280]	$π^{\pm}$ p, K^±p, pp, p $\bar{p}$	$σ^{tot}$
E104_1979_I132765	[103]	$✓$		[200 - 370]	$π^{\pm}$ p, K^±p, pp, p $\bar{p}$	$σ^{tot}$
E104_1979_I132133	[104]	$✓$		[60 - 280]	$π^{\pm}$ p, K^±p, pp, p $\bar{p}$	$σ^{absorption}$
NA8_1983_I182455	[105]	$✓$		[30 - 345]	$π^{-}$ p, pp	elastic $d σ / d t$
NA22_1986_I18431	[240]	$✓$	$✓$	250	$π^{+}$ p, K⁺p, pp	$P_{n}$
NA22_1987_I246909	[106]	$✓$		250	$π^{+}$ p, K⁺p	elastic $d σ / d t$
NA22_1988_I265504	[107]	$✓$	$✓$	250	$π^{+}$ p, K⁺p, pp	$d σ / d x_{F} d p_{T} d y$
NA22_1990_I301243	[94]	$✓$		250	$π^{+}$ p	$d σ / d x_{F}, d σ / d p_{T}^{2}$
NA22_1992_I322980	[95]	$✓$		250	$π^{+}$ p, K⁺p	$d σ / d x_{F}, d σ / d p_{T}^{2}$
NA49_2006_I694016	[96]	$✓$	$✓$	158	pp	$E d σ / d x_{F} d p_{T}^{2}$
NA49_2009_I818217	[97]		$✓$	158	pp	$E d σ / d x_{F} d p_{T}^{2}$
NA61_2017_I1598505	[98]	$✓$		[20 - 158]	pp	$d^{2} n / d y d p_{T}$
NA61_2017_I1600971	[99]			158, 350	$π^{-}$ C	$x_{F} d n / d x_{F}$
NA61_2019_I1753094	[100]	$✓$		60, 120	pC, pBe, pAl	$σ^{prod}, σ^{inel}$
NA61_2022_I1868367	[241]			158	pp	$d^{2} n / d y d p_{T}$ , $d n / d y$
NA61_2023_I2155140	[101]			158, 350	$π^{-}$ C	$σ^{prod}$ , $d^{2} n / d p d p_{T}$

Equations2

z = \frac{ln ⟨ N _{μ}^{det} ⟩ ( E ) - ln ⟨ N _{μ, p}^{det} ⟩ ( E )}{ln ⟨ N _{μ, Fe}^{det} ⟩ ( E ) - ln ⟨ N _{μ, p}^{det} ⟩ ( E )}

z = \frac{ln ⟨ N _{μ}^{det} ⟩ ( E ) - ln ⟨ N _{μ, p}^{det} ⟩ ( E )}{ln ⟨ N _{μ, Fe}^{det} ⟩ ( E ) - ln ⟨ N _{μ, p}^{det} ⟩ ( E )}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParticle physics theoretical and experimental studies · High-Energy Particle Collisions Research · Particle Detector Development and Performance

Full text

Road map for the tuning of hadronic interaction models with accelerator-based and astroparticle data

J. Albrecht

Ruhr Astroparticle and Plasma Physics Center (RAPP Center), Bochum, Germany

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

Lamarr Institute for Machine Learning and Artificial Intelligence, Dortmund, Germany

J. Becker Tjus

Ruhr Astroparticle and Plasma Physics Center (RAPP Center), Bochum, Germany

Theoretical Physics IV: Plasma Astroparticle Physics, Ruhr University Bochum, 44780 Bochum, Germany

Department of Space, Earth and Environment, Chalmers University of Technology, Gothenburg, Sweden

N. Behling

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

J. Blazek

FZU – Institute of Physics of the Czech Academy of Sciences, Prague, Czech Republic

M. Bleicher

Institute for Theoretical Physics, Goethe University Frankfurt, Frankfurt am Main, Germany

J. Boelhauve

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

L. Cazon

Instituto Galego de Física de Altas Enerxías (IGFAE), Universidade de Santiago de Compostela, Santiago de Compostela, Spain

R. Conceição

Physics Department, Instituto Superior Técnico (IST), University of Lisbon, Lisbon, Portugal

Laboratório de Instrumentação e Física Experimental de Partículas (LIP), Lisbon, Portugal

H. Dembinski

Ruhr Astroparticle and Plasma Physics Center (RAPP Center), Bochum, Germany

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

L. Dietrich

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

J. Ebr

FZU – Institute of Physics of the Czech Academy of Sciences, Prague, Czech Republic

J. Ellbracht

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

R. Engel

Institute for Astroparticle Physics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

A. Fedynitch

Institute of Physics, Academia Sinica, Taipei, Taiwan

M. Fieg

Department of Physics and Astronomy, University of California, Irvine, CA, USA

M.V. Garzelli

II Institute for Theoretical Physics, Hamburg University, Hamburg, Germany

C. Gaudu

Faculty of Mathematics and Natural Sciences, University of Wuppertal, D-42119 Wuppertal, Germany

G. Graziani

INFN, Sezione di Firenze, Florence, Italy

P. Gutjahr

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

A. Haungs

Institute for Astroparticle Physics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

T. Huege

Institute for Astroparticle Physics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

Astrophysical Institute, Vrije Universiteit Brussel, Brussels, Belgium

K. Hymon

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

M. Hünnefeld

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

K.-H. Kampert

Faculty of Mathematics and Natural Sciences, University of Wuppertal, D-42119 Wuppertal, Germany

Ruhr Astroparticle and Plasma Physics Center (RAPP Center), Bochum, Germany

L. Kardum

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

L. Kolk

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

N. Korneeva

School of Physics and Astronomy, Monash University, Clayton, VIC 3800, Australia

K. Kröninger

Ruhr Astroparticle and Plasma Physics Center (RAPP Center), Bochum, Germany

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

A. Maire

IPHC – Institut Pluridisciplinaire Hubert Curien, CNRS-IN2P3 / Université de Strasbourg, UMR 7178, Strasbourg, France

H. Menjo

Institute for Space-Earth Environmental Research (ISEE), Nagoya University, Nagoya, Japan

L. Morejon

Faculty of Mathematics and Natural Sciences, University of Wuppertal, D-42119 Wuppertal, Germany

S. Ostapchenko

II Institute for Theoretical Physics, Hamburg University, Hamburg, Germany

P. Paakkinen

Department of Physics, University of Jyväskylä, Jyväskylä, Finland

Helsinki Institute of Physics, University of Helsinki, Helsinki, Finland

T. Pierog

Institute for Astroparticle Physics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

P. Plotko

Deutsches Elektronen-Synchrotron (DESY), Platanenallee 6, 15738 Zeuthen, Germany

A. Prosekin

Institute of Physics, Academia Sinica, Taipei, Taiwan

L. Pyras

Deutsches Elektronen-Synchrotron (DESY), Platanenallee 6, 15738 Zeuthen, Germany

Erlangen Center for Astroparticle Physics (ECAP), Friedrich-Alexander-Universität Erlangen-Nürnberg, Nikolaus-Fiebiger-Straße 2, 91058 Erlangen, Germany

Department of Physics and Astronomy, University of Utah, Salt Lake City, UT 84112, USA

T. Pöschl

European Organization for Nuclear Research (CERN), Geneva, Switzerland

J. Rautenberg

Faculty of Mathematics and Natural Sciences, University of Wuppertal, D-42119 Wuppertal, Germany

M. Reininghaus

Institute for Astroparticle Physics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

W. Rhode

Ruhr Astroparticle and Plasma Physics Center (RAPP Center), Bochum, Germany

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

Lamarr Institute for Machine Learning and Artificial Intelligence, Dortmund, Germany

F. Riehn

Ruhr Astroparticle and Plasma Physics Center (RAPP Center), Bochum, Germany

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

M. Roth

Institute for Astroparticle Physics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

A. Sandrock

Faculty of Mathematics and Natural Sciences, University of Wuppertal, D-42119 Wuppertal, Germany

I. Sarcevic

Department of Physics, University of Arizona, Tucson, AZ, USA

M. Schmelling

Max-Planck-Institut für Kernphysik, Heidelberg, Germany

G. Sigl

II Institute for Theoretical Physics, Hamburg University, Hamburg, Germany

T. Sjöstrand

Department of Physics, Lund University, Lund, Sweden

D. Soldin

Department of Physics and Astronomy, University of Utah, Salt Lake City, UT 84112, USA

M. Unger

Institute for Astroparticle Physics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

M. Utheim

Department of Physics, University of Jyväskylä, Jyväskylä, Finland

J. Vícha

FZU – Institute of Physics of the Czech Academy of Sciences, Prague, Czech Republic

K. Werner

SUBATECH – Laboratory of Subatomic Physics and Associated Technologies, University of Nantes, IMT Atlantique, CNRS/IN2P3, Nantes, France

M.E. Windau

Department of Physics, TU Dortmund University, D-44221 Dortmund, Germany

V. Zhukov

Institute for Experimental Physics 1b, RWTH Aachen University, Aachen, Germany

(September 25, 2025)

Abstract

In high-energy and astroparticle physics, event generators play an essential role, even in the simplest data analyses. As analysis techniques become more sophisticated, e.g. based on deep neural networks, their correct description of the observed event characteristics becomes even more important. Physical processes occurring in hadronic collisions are simulated within a Monte Carlo framework. A major challenge is the modeling of hadron dynamics at low momentum transfer, which includes the initial and final phases of every hadronic collision. QCD-inspired phenomenological models used for these phases cannot guarantee completeness or correctness over the full phase space. These models usually include parameters which must be tuned to suitable experimental data. Until now, event generators have been developed and tuned mainly on the basis of data from high-energy physics experiments at accelerators. The wealth of data available from the latest generation of astroparticle experiments has not yet been fully exploited, and in many cases is not satisfactorily described. Both kinds of data sets are complementary as astroparticle experiments provide sensitivity especially to hadrons produced nearly parallel to the collision axis and cover center-of-mass energies up to several hundred TeV, well beyond those reached at colliders so far. In this report, we provide an overview of state-of-the-art event generators and their tuning, including the most relevant inputs from high-energy accelerator and astroparticle experiments. We present a road map that shows, for the first time, how the unified tuning of event generators with accelerator-based and astroparticle data can be performed.

1 Introduction

The simulation of high-energy particle collisions is an essential task in many fields of science, such as high-energy nuclear and particle physics or high-energy astroparticle physics. The simulation involves several steps, namely the event generation, hadronization and particle decay, stable particle propagation, and detector response simulation. The predictions of the event generators are usually based on the Standard Model (SM) of particle physics with new phenomena being tested against the SM predictions. So far, event generators are essentially developed and tuned solely based on data from accelerator-based experiments. Tuning is the process of adjusting free parameters of phenomenological models in event generators based on comparisons with data. It differs from ”fitting” in that the function being fitted is not arbitrary, but is given by the model. In this article, we explore the opportunities and challenges of incorporating data from astroparticle experiments into the development and tuning of event generators, and outline how such a global tuning can be achieved. Given that specific tunes of event generators based on accelerator experiments have been found to be inconsistent with data from astroparticle experiments, there is a clear need for such an effort with potential benefits for a wide range of applications.

Event generators are crucial for many purposes, including the design of new experiments, the development of data analysis methods, or the prediction of particle interactions with detector material. In addition, when searching for rare events or processes, such as Higgs production at the Large Hadron collider, data are typically contaminated by background, and event generators help to find experimental designs and analysis methods that allow one to reduce this contamination. For this purpose, they need to predict the frequency and distribution of signal and background events with a high degree of accuracy, since the efficiency and purity of an event selection can often only be calculated on the basis of a full end-to-end simulation of the entire experiment. The need for accurate event generators becomes more important as better experimental data become available, and is particularly important also for applying machine learning methods, which tend to outperform classical analysis, e.g. in classification performance, at the cost of being more sensitive to mismatches between simulated and experimental data.

In astroparticle physics experiments, event generators are used to simulate interactions of cosmic particles with the Earth and its atmosphere and in specific applications also to simulate particle interactions within cosmic ray sources and during their journey to Earth. High-energy cosmic rays, cosmic neutrinos, and gamma-rays are characterized by low fluxes, so that large aperture ground-based experiments are required to detect them. These experiments observe cosmic particles indirectly through showers of secondary particles. The showers themselves originate from collisions of high-energy primary particles in matter, most importantly air, water, or ice, and are typically detected by their emission of light (fluorescence or Cherenkov), radio emission, or classically by sparse arrays of charged particle detectors distributed over an extended area or volume. More recently, experiments have begun to combine two or more of the above methods to observe the same shower in different ways. This is known as hybrid detection. Event generators simulate the development of the particle cascade to determine the relationship between the detector response and the initial cosmic particle. If the simulation is not accurate, the interpretation of data is biased. While the energy scale of TeV gamma-ray experiments can in principle be calibrated against the GeV gamma rays detected by satellite experiments using standard candles such as the Crab Nebula [1, 2], there is no equivalent calibration source for high-energy cosmic rays and neutrinos, making the theoretical uncertainties in the event generators a major source of uncertainty in these experiments. A prime example of this is the muon puzzle [3] in extensive air showers (EAS), which causes an ambiguity in the inferred mass composition of ultra-high energy cosmic rays [4].

A major source of uncertainty in event generators is the treatment of hadronic interactions at low momentum transfer, where perturbative quantum chromodynamics (pQCD) is not applicable. The most important example here is the copious production of light-flavor particles which dominate the development of secondary particle cascades in various media. But also in high-momentum transfer interactions, such as heavy-flavor production, the initial (parton momentum distribution) and final stages (hadronisation) are non-perturbative.

QCD-inspired phenomenological models are used to describe non-perturbative processes, which are neither guaranteed to be correct nor complete over the entire phase-space. The discovery of collective flow and enhanced strangeness production not only in nucleus-nucleus but also proton-proton and proton-lead collisions at the TeV scale [5, 6] are recent examples of surprises in QCD.

To arrive at a highly complete and accurate description of QCD phenomena, and to reduce uncertainties and ambiguities in the interpretation of data from astroparticle experiments, it is important to develop event generators using data from all sources. So far, data from astroparticle experiments have not been widely used in this context, apart from a few pioneering studies, see for example [7]. While previous generations of astroparticle experiments were not precise enough for this purpose, the latest generation provides a great wealth of data.

A comparison of specific aspects of accelerator and astroparticle physics data is given in Table 1. Several challenges must be overcome to use astroparticle data: the initial state of a collision is variable and not well known, and air shower detector arrays observe the final state of a cascade of interactions in a medium such as air rather than a single interaction, while optical and radio observations observe the evolution of the entire shower particle ensemble along the shower axis. However, astroparticle data are complementary to accelerator data: they are sensitive to light and heavy flavor production at forward rapidities and probe collisions at center-of-mass (CM) energies of up to $500$ TeV, as well as to collisions that are not easily accessible at colliders [8, 9], such as those initiated by pions and kaons. By exploiting complementary data from both fields, theoretical uncertainties in event generators can be further reduced. This also provides a powerful test of the effective phenomenological models employed; ultimately, event generators including sufficiently general models should be able to describe all the data without inconsistencies.

Automatic tuning allows a rapid cycle of testing new models, and allows a quick retuning as new models or data become available. Due to the fundamental differences between the measurements in the two fields, this is a complex effort, mostly because of the unknown initial state of the first and subsequent collisions and the computational resources needed for full EAS simulations. The development of standardized tools is a crucial prerequisite for this effort.

In this report, we present for the first time a road map showing how to tune event generators simultaneously using data from both accelerators and astroparticle physics experiments. In Section 2, we summarize the theoretical approaches implemented in current event generators and recent developments towards event generators that are applicable to both high-energy and astroparticle physics. In Section 3, we summarize recent developments in the simulation of particle transport in matter, which are essential for the interpretation of high-energy astroparticle experiments and for global tuning. In Section 4, we give an overview of the most important measurements from accelerator and astroparticle experiments that provide input for global tuning. In Section 5, we summarize the current state of tuning of event generators and discuss how the tools involved need to be extended or replaced to enable global tuning, followed by a discussion of current challenges and possible solutions to achieve global tuning in Section 6. We conclude with a summary in Section 7.

2 Theoretical foundations and event generators

Despite significant progress both in the predictions of perturbative Quantum Chromodynamics (QCD) and in measurements at the Large Hadron Collider (LHC), it is still not possible to calculate from first principles the bulk of particle production processes at high energies. Only processes with large momentum transfer, also known as hard interactions, are accessible to the perturbative methods. To obtain a complete description of hadron collisions in accelerator experiments, it is necessary to combine results from perturbative QCD and general theoretical constraints with phenomenological modeling. To make predictions for particle production in hadron collisions in the astroparticle context, one must also extrapolate the distributions measured at accelerators into unmeasured regions of phase-space and to much higher energies.

Processes in which heavy quarks are produced in the final state are necessarily hard, and pQCD calculations can be quite accurate, but even in these calculations non-perturbative parts enter to model long-distance physics effects governing the parton-to-hadron transition and vice versa (fragmentation functions/hadronization, parton distribution functions).

Event generators use Monte Carlo simulations to describe QCD interactions. Table 2 presents the latest generation of event generators that are used in both high-energy accelerator physics and astroparticle physics. A brief description of each of these generators can be found in Appendix A.

Pythia 8 [14] is an event generator often used in high-energy physics (HEP) and more recently also applied for cosmic ray physics [15]. Its core is built on the pQCD model. A (semi-)hard scattering is the starting point of each inelastic interaction, accompanied by emissions of partons before, or after the hard scattering (so-called initial/final state radiation implemented in parton shower algorithms). Multiple (semi-)hard scatterings are also allowed. The soft physics effects are described by phenomenological models. The cross sections for (semi-)hard scattering are determined from pQCD using a threshold in the transverse momentum (suppressed by a smooth damping factor). The modeling of the total elastic, diffractive, and inelastic cross sections in Pythia 8 is decoupled from the particle production mechanism.

This is different for the event generators used in astroparticle physics. Most of these describe, with varying degrees of rigor, the interaction of hadrons as the exchange of specifically structured ”networks” of interacting quarks and gluons (so-called Pomeron and Reggeon exchanges). The mathematical framework for these exchanges, an effective quantum field theory, is the so-called Gribov-Regge theory (GRT) [16, 17], see Refs. [18, 19] for a pedagogical introduction. It allows one to connect the wide range of processes that occur in hadron collisions, such as elastic scattering, diffractive scattering and soft multi-particle production up to multiple hard parton scattering. The energy evolution of the hadron multiplicity and the total cross section are thus linked in GRT-based models.

Sibyll [20, 21, 13] is a GRT model that describes inelastic collisions with a soft and a hard part, where the hard part is based on the pQCD cross section calculated with an energy-dependent $p_{\perp 0}$ cutoff (similar to Pythia), while the soft part is purely phenomenological. The EPOS [10, 11] and QGSJet [12] families of models use a semi-hard pomeron that consistently mixes aspects of the GRT and pQCD descriptions. The semi-hard pomeron provides the analog of initial and final state radiation (ISR and FSR), which in particular leads to broader hadron spectra in rapidity. The EPOS family of generators is based on so-called parton-based GRT, where the pomerons are exchanged between partons instead of hadrons and energy conservation is ensured at the amplitude level. The QGSJet family of generators implements pomeron-pomeron interactions. Both are mechanisms that slow down the growth of the total cross section at high energies, which is necessary to describe the measurements.

Single diffraction dissociation, which occurs in events where only one of the incoming hadrons is dissociated, is an important contribution to hadron production in the forward region. In GRT models, this is described by the exchange of a specific colorless (quantum number-less) configuration of quarks and gluons (the so-called diffractive pomeron). This is either explicitly included in the amplitude or added using the Good-Walker model [22].

In the context of air showers, the models need to be reliably extrapolated from hadron-hadron to hadron-nucleus and nucleus-nucleus interactions. In the EPOS and QGSJet models the nuclear amplitudes are constructed from the basic pomeron amplitude using the Glauber formalism [23, 24]. In Sibyll, a mixture of Glauber (for hadron-nucleus) and an extended superposition model (for nucleus-nucleus) [25] is used. Pythia can use the Angantyr module [26], which superimposes individual nucleon-nucleon interactions following a Monte-Carlo Glauber calculation. Pythia can optionally use nuclear parton distribution functions (PDFs) for collisions with nuclei, which differ from the PDFs of free protons.

The description of heavy ion collisions (HIC) requires the inclusion of effects such as collective flow, jet quenching, and possible modifications of the final-state hadron composition (which may affect e.g. the strangeness enhancement). In EPOS, this is described by distinguishing between hadronization of the high-(energy) density part of the collision (core), modeled by the formation of a quark-gluon plasma that evolves hydrodynamically and then decays statistically, and hadronization of the low-(energy) density part (corona), based on string fragmentation (similar to the standard Pythia hadronization mechanism). This is followed by a phase of hadronic rescattering. Similarly, in Pythia 8, color reconnection, rope fragmentation, string shoving, and hadron rescattering are means to modify standard hadronization in vacuum to describe the effects observed in HIC. Sibyll and QGSJet do not include these types of HIC effects.

Heavy quark production has traditionally not been implemented in event generators used for EAS simulation, since the effect on most air shower observables is negligible [27], but the need to model the prompt atmospheric lepton flux in the latest generation of neutrino observatories has led to the inclusion of charm production in Sibyll-2.3d, EPOS4, and EPOS LHC-R. Pythia can produce all heavy quarks.

Event generators are stand-alone codes often with non-standard interfaces. However, software packages such as CRMC [28] and Chromo [29] simplify the use and comparison of the previously listed event generators by providing a common interface and unified output.

3 Particle transport in matter

Transport codes simulate the propagation, decay, and interaction of high-energy particles with a medium such as air, water, ice, or interstellar gas. They employ event generators to handle interactions and decays. This section provides an overview of the most relevant particle transport codes and some of their applications. A recent review of transport codes is also given in Ref. [3]. In Table 3, transport codes are compared from the point of view of their use in event generator tuning. They link the physics of hadronic interactions in a cascade with astrophysical “macroscopic” experimental observables, such as the depth of the shower maximum $X_{\text{max}}$ , the number of muons produced, or the atmospheric high-energy lepton flux. Theoretical uncertainties in particle transport arise mainly from the theory and the phenomenological assumptions implemented in the event generators, but are also related to the propagation of secondary particles. The propagation of high-energy muons through dense media has recently been significantly improved with the Proposal propagation code [30, 31]. Such effects are important in neutrino observatories like IceCube and in underground laboratories.

From the point of view of an accelerator-based experiment, transport codes (such as GEANT4 [32] or Fluka [33]) simulate particle transport through “detector material”. For example, the Earth’s atmosphere and magnetic field can be considered as part of the detector of an air shower experiment, and must to be carefully monitored, as these properties change on a daily and seasonal basis. Tracking the vertical density profile of the atmosphere and its optical properties is important, which is done using laser and electron beams [34, 35], local weather balloon flights and satellite-based models [36]. In accelerator-based experiments, theoretical uncertainties are reduced by minimizing the amount of material the particles must pass through and by sophisticated calibration schemes. In astroparticle experiments, the amount of material is necessarily large, and calibration against a known source is not possible, except at gamma-ray observatories. In this case the Crab nebula is often used as a reference source. Therefore, great care is taken to make not only the interaction but also the transport codes as accurate as possible. The impact of theoretical uncertainties in event generators is further amplified for some air-shower observables, such as the total number of produced muons, which depends on the evolution of the whole hadronic cascade, while other observables depend mainly on the first few interactions of the primary cosmic ray, such as the depth of the shower maximum.

Computational speed and efficiency of transport codes is a key aspect to be considered in tuning procedures, since calculating a change in an air shower observable as response to a change in an event generator may require simulating hundreds of air showers in order to average out shower-to-shower fluctuations. Full Monte Carlo simulation, as employed by Corsika [37] and CRPropa [38], is the gold standard, but very demanding in terms of computational resources. The computational effort is proportional to the number of particles that need to be tracked. For air showers, this number is proportional to the cosmic-ray energy, which in turn spans over 11 orders of magnitude. Full simulation becomes impractical above $10^{17}$ eV. At higher energies, the thinning technique [39] in which only a representative subset of particles is simulated can be used. The computational load further increases if detailed Cherenkov or radio emission has to be simulated, which is needed for some observables.

The numerical solution of cascade equations is an alternative technique for simulating air showers and calculating particle fluxes. The latter is used by Conex and MCEq. In this approach, hadronic cascades are described by a large system of differential (cascade) equations, one for each particle species, containing source and sink terms describing energy loss, particle interaction and decay [40]. The input particles are binned in energy and atmospheric depth, and the differential equations are then solved numerically. This allows one to compute atmospheric lepton fluxes, for example, using as input the cosmic ray flux at the top of the atmosphere, or an “average” air shower from monoenergetic cosmic rays. Examples of outputs are longitudinal density profiles and energy distributions of secondary particle fluxes. This approach works for calculating only some of the air shower observables. Observables that require simulation of shower-to-shower fluctuations or lateral particle distributions at the detector sites (typically required by ground-based air shower arrays), require full Monte Carlo or 3D hybrid simulations, in which the initial and final stages of an air shower are Monte Carlo simulated, and cascade equations are solved numerically for the middle part [41, 42]. On the other hand, observables that are already defined as an average over many similar showers (such as the depth of the shower maximum or the number of muons produced) are obtained directly from these cascade equations solvers.

Numerical solvers of cascade equations use precomputed tables of the energy spectra of secondary particles for colliding particles at different energies. With these tables, it takes only seconds to compute an air shower, but the computational cost of generating these tables from event generator calls is substantial, requiring on the order of (10–1000)k events, depending on the desired smoothness of the tables and whether charmed hadrons, which have a lower production cross section, are to be simulated. However, cascade equation solvers have a computational advantage in tuning procedures where air shower observables of different primary cosmic ray energies and masses are to be computed multiple times during the tuning process.

The transport codes shown in Table 3 are briefly discussed in Appendix B in the context of event generator tuning.

Factorization or reweighting techniques can be used to reduce the computational cost of CPU-intensive transport codes. In the factorization approach [43, 44], intermediate “high-level” key variables are identified, such as the inelastic cross section or the hadron multiplicity, which have a strong influence on the air shower observables. Assuming that the high-level quantities scale logarithmically with the cosmic-ray energy, one can pre-calculate the impact on the air-shower observables. This drastically reduces the effort to the calculation of a high-level variable by an event generator at a fixed cosmic-ray energy. An example of such an analysis is discussed below (cf. Fig. 1). On the other hand, in the reweighting approach, weights are applied to the precomputed tables used by a cascade equation solver, or to a set of previously Monte Carlo simulated air showers, which must be stored as complete graphs, including all particle interaction and decay history. The weights are chosen to reflect the change in the event generator. Reweighting can be effective when the change in the event generator has only an isolated effect, for example, only on selected particle types. Both strategies have the potential to speed up tuning considerably, but at the cost of making strong assumptions that need to be carefully validated. They also run the risk of missing unexpected side effects of changes to event generators.

4 Input from experiments

Traditionally, event generators are only tuned to input from accelerator-based experiments (classic tuning), and only a few exceptional studies already explored more global tuning [7]. Classic tuning uses measurements of, e.g., production cross sections, hadron multiplicity spectra, energy flow, and rapidity gaps. For a global tuning a variety of air shower observables are available: the mean and fluctuations of the depth of the shower maximum, the mean and fluctuations of the produced number of muons, and other observables related to these. The mean number of muons is sensitive to small changes in secondary hadron yields, which become exponentiated by the hadronic cascade [43, 3], while the other air shower observables are dominated by the first interaction. Further input can be derived from the atmospheric muon flux at PeV energies, whose conventional component (i.e. the component arising from light-flavour production, hadronization, and decay) is sensitive to hadron production over a wide range of energies, while the prompt component (i.e. that arising from heavy-flavour production) is sensitive to charm production. Tuning to air shower observables also requires a model of the cosmic ray flux and its composition, which has its own uncertainties. These uncertainties in turn largely derive from uncertainties in the event generators, so a truly global tune should also adjust the cosmic ray flux model.

Inputs from air showers and accelerators complement each other, and both have their respective limitations. The highest CM energy at an accelerator on Earth achieved so far amounts to 13.6 TeV. In air showers, the CM energy of the first collision can be significantly higher, up to several hundred TeV. In many cases, the precision of air shower measurements is significant and puts constraints on quantities only loosely constrained by accelerator measurements. For example, the fluctuations in the muon number in air showers constrain the elasticity, the fraction of energy taken by secondary particles, which is difficult to measure at the LHC.

Another complementarity is in the accessible rapidity range. Measurements from accelerator-based experiments are often expressed as a function of the transverse momentum $p_{\perp}$ , and rapidity $y$ or pseudorapidity $\eta$ of the products in the CM frame. Rapidity $y=\frac{1}{2}\ln[(E+p_{\parallel})/(E-p_{\parallel})]$ , where $p_{\parallel}$ is the momentum component parallel to the beam line, is a measure of how forward going the particle is.

At the TeV scale, the production cross sections for hadrons near projectile rapidities are most important for air showers [3], but the most detailed measurements with identified hadrons from LHCb [45] end at $y<5$ . Further forward, we have measurements from TOTEM on the charged particle multiplicity [46], and from CMS-CASTOR on the energy flow [47], and at $y>8$ from the LHCf experiment [48] on the $\pi^{0}$ and neutron production cross sections. There are no collider data on strangeness or charm production beyond $y>5$ . Muon production in air showers is sensitive to these production cross sections, and therefore can constrain the cross sections in the forward region.

Finally, there is complementarity in the collision systems. At the LHC, so far only p+p, p+Pb, and Pb+Pb collision systems were investigated, while the most common system in air showers is $\pi$ +(N,O). The properties of $\pi+A$ collisions beyond the CM energies of fixed target experiments have to be inferred from data on p+ $A$ collisions, since pion beams are not available at colliders. Until measurements with the oxygen beams become available at the LHC, the extrapolation from p+p and p+Pb to p+O and p+N remains uncertain at the TeV scale.

Global tuning will inevitably reveal discrepancies between models and data. On the experimental side, there can be hidden systematic effects in the measurements, not covered by the quoted uncertainties. On the theory side, models may lack the necessary physics content, robustness and flexibility to reproduce all available measurements. In either case, a good fit in the statistical sense of the models to the available data will not be possible, but global tuning is necessary to reveal these discrepancies, so they can be addressed by the community.

4.1 Accelerator-based experiments

Measurements at particle accelerators tackle properties of single interactions in different combinations of particles and nuclei and thus provide valuable constraints on the modeling of fundamental interactions in the evolution of air showers. Accelerator-based experiments allow us to measure billions of collisions with a well-defined identical initial state and a fixed collision energy. This allows one to compute production cross sections to high accuracy and to measure relatively rare processes. On the flip-side, there are only a few beam-beam combinations at discrete energies available, so it is not possible to directly measure all relevant interactions which happen in air showers. Theory embedded in event generators will always be needed to inter- and extrapolate from these reference measurements.

In the following, we will mostly focus our discussion on experiments using hadron beams such as the ones provided at the LHC and its pre-accelerator the Super Proton Synchrotron (SPS). Nevertheless measurements at the Brookhaven Relativistic Heavy Ion Collider (RHIC) as well as older data from the Tevatron and other accelerators also provide important input by constraining event generators on a broad range of collision energies. Note that experiments conducted at lepton-lepton and lepton-hadron colliders such as LEP and HERA are also included in the tuning because they allow fixing string fragmentation parameters [49] and the parton distribution functions of hadrons [50], respectively. Future machines such as the Electron Ion Collider [51] and the FCC-ee [52] at CERN will probe hadrons at new scales.

Accelerator data are available from either fixed-target or colliding-beam experiments. The major differences between these groups of experiments are in the CM energy scale that is being probed, the kinematic coverage, and the flexibility in studying different collision systems. Current fixed-target experiments probe nucleon-nucleon CM energies up to about 100 GeV, the LHC experiments in colliding-beam configuration currently reach up to 13.6 TeV.

An advantage of fixed-target experiments is the possibility to study a greater variety of beam and target combinations. The target is usually a thin foil or small block of material that can be easily changed. Liquid and gaseous targets are also used. Experiments can use primary beams, like the proton and lead beams from the LHC or SPS, or secondary beams produced by colliding the SPS beam with a production target. This is currently the only way to study $\pi+A$ interactions.

Fixed-target experiments measure the produced particles in the rest frame of the target, i.e. in the laboratory system. Some fixed-target experiments, like NA61/SHINE [53] ( $|y|\lesssim 3$ ), cover most of the rapidity range in the CM frame, since the rapidity range is fairly narrow at these energies. Collider experiments typically measure particles in a frame that is close or identical to the CM frame of the colliding nucleons. Such experiments are typically most densely instrumented in the mid-rapidity region, where most particles are produced. On the other hand, the forward region ( $y\gtrsim 2$ ), where most of the beam energy flows into, is typically sparsely instrumented.

The four large LHC experiments are ATLAS [54], CMS [55], ALICE [56], and LHCb [57]. They are all designed as general-purpose detectors, but each with a different focus. ATLAS and CMS have been designed for discovering new heavy particles, such as the Higgs boson and heavy supersymmetric candidates. This is best done in the mid-rapidity region ( $|y|<2.5$ ), where these experiments have their main instrumentation. They have lepton-hadron separation capabilities, but only limited hadron-identification capabilities. The main focus of ALICE is studying QCD in the mid-rapidity region, especially in proton-nucleus and nucleus-nucleus collisions. ALICE is well-equipped for this task with a high-resolution tracker and excellent hadron identification capabilities over a large momentum range. The main focus of LHCb is the study of the production of hadrons containing heavy flavor, i.e. charm and bottom quarks. It is instrumented mainly in the forward region and has very good particle identification capabilities for hadrons with momenta up to 100 GeV $/c$ . LHCb is equipped with a system that enables the injection of gases into the beam pipe, allowing LHCb to operate as a fixed-target experiment [58]. This provides a unique opportunity to study proton-nucleus and nucleus-nucleus collisions with different gaseous targets using the LHC beams. The system has recently been upgraded to extend the injected gases beyond noble gases, most notably also including nitrogen and oxygen [59].

Important observables in these experiments are total and differential cross sections, multiplicity distributions, and energy flow. Results from proton-proton, proton-nucleus and nucleus-nucleus collisions allow one to understand nuclear and collective effects. The relevance of the different quark flavors can be probed by studying the production of particles containing strange, charm or bottom quarks. A comprehensive overview of data from accelerator experiments that is useful for tuning hadronic interaction models is given in Ref. [3]. The most important experiments for the tuning of QCD-inspired models and their most relevant results are summarized in Appendix C.

For air shower simulations, it is important to understand nuclear effects, which modify production cross sections relative to collisions of free nucleons. So far, LHC beams have probed proton-proton, proton-lead, and lead-lead collisions, which represent extreme ends in the space of collision systems. In addition to that, a short pilot run with xenon beams was conducted in 2017. These reference systems are not ideal for air showers, in which collisions with nitrogen and oxygen are dominant. These nuclei have masses near the geometric mean between proton and lead, and it is not clear how features of hadron production shall be interpolated to this point. Present event generators used to simulate air showers show a considerable spread in their predictions for the hadron multiplicity in proton-oxygen collisions, at the level of $25\%$ . In contrast, they agree to $5\%$ in proton-proton collisions in the mid-rapidity region, because they are tuned to these data [60].

Proton-oxygen and oxygen-oxygen collisions will allow us to determine how hadron production evolves from light to heavy systems at high energies, and are direct references for common air shower collisions. Comparing them will also allow us to test the superposition model assumption at the TeV scale. First runs with the LHC providing p+O and O+O collisions are planned for 2025 [61]. There is a widespread interest in accelerating other nuclei at the LHC, and work towards this goal is coordinated by the Working Group on Future Ions at CERN [62].

4.2 Astroparticle experiments

Astroparticle experiments measure high-energy gamma rays, neutrinos, or cosmic rays, whose flux as a function of energy follows a steeply falling power law. Therefore, large apertures are needed for measurements at very high energies. This is achieved with ground-based experiments that measure the characteristics of particle showers initiated by the primary projectile in the Earth’s atmosphere, water bodies or ice shields.

The properties of the primary projectile are inferred from the observed features of the particle shower. The direction of the projectile is inferred from the arrival times of shower particles at ground level detectors. The energy is derived from integration of the energy deposition profile in the atmosphere, which is a nearly model-independent technique, or from the number of particles at the ground level. The latter requires a detailed simulation of the particle cascade to obtain a calibration curve from the simulation.

The mass $A$ of a cosmic-ray projectile is inferred from the depth of the electromagnetic shower maximum $X_{\text{max}}$ , the depth of the maximum muon production $X_{\mu,\text{max}}$ [63] or the number of muons $N_{\mu}$ produced in the particle shower. Other mass sensitive observables can be reduced to these three fundamental observables. They probe different aspects of the hadronic interactions in the cascade, whose evolution is dominated by soft QCD interactions. Complementary information is obtained from the mean values of $X_{\text{max}}$ , $N_{\mu}$ , and $X_{\mu,\text{max}}$ , and from their stochastic shower-to-shower fluctuations in a narrow energy interval. The fluctuations are very sensitive to the first interaction of the primary projectile, which is also true for $\langle X_{\text{max}}\rangle$ , while $\langle N_{\mu}\rangle$ and $\langle X_{\mu,\text{max}}\rangle$ are sensitive to the whole evolution of the hadronic cascade. One can therefore probe QCD features of the first interaction at hundreds of TeV in the CM frame, and features of the whole chain down to GeV. Figure 1, adapted from [43, 3], illustrates in which way air shower observables, such as the muon number, $N_{\mu}$ , and the shower maximum position in the atmosphere, $X_{\text{max}}$ , react to changing basic parameters of hadronic interactions. For example, increasing the interaction cross section shifts $X_{\text{max}}$ to higher altitudes and reduces its fluctuations, but has only little effect on the muon numbers and their fluctuations. Table 4 summarizes how the observables relate to QCD features and cosmic rays properties. In the case of the muon number, $N_{\mu}$ , further information can be obtained from the lateral distribution function, whose characteristic width is inversely proportional to the muon detection threshold.

While the direction and energy of a cosmic ray can be determined in a model-independent way, other observables are sensitive to the a priori unknown mean and variance of the logarithmic mass number $\ln\!A$ . This additional uncertainty poses a challenge to the tuning of event generators, but can be addressed in two ways. First, there is a unique energy window in the cosmic-ray flux of (1–3) EeV, where the primary mass is dominated by protons [64]. The proton fraction can be further enhanced by selecting showers with a deep shower maximum, thus avoiding the uncertainty in the composition. Second, one can study several observables simultaneously, all of which must be consistent with the same $(\langle\ln\!A\rangle,\sigma(\ln\!A))$ hypothesis. The muon deficit in air shower simulations was discovered in this way [65, 66].

Neutrino telescopes, measuring the flux of atmospheric leptons and astrophysical neutrinos, provide additional observables for tuning. The atmospheric lepton fluxes are the result of the cascade generated by cosmic-ray interactions with the atmosphere, as discussed in Section 2. At low energy, the conventional flux due to the decay of pions and kaons in the atmosphere dominates. At high energies, the prompt flux then takes over. It originates from the decay of short-lived hadrons containing charm and bottom quarks, generally produced in the first few interactions of the primary cosmic ray. In the case of the flux of muons other short-lived resonances also contribute. Although most charm mesons are produced at mid-rapidity, due to the steeply falling cosmic-ray flux, charm mesons produced at rapidities $>4.5$ contribute most to the prompt neutrino flux. Thus, the atmospheric lepton flux is sensitive to QCD features over a wide range of energies. At the level of a single neutrino interaction event, one cannot distinguish between the atmospheric and the extraterrestrial neutrino fluxes, but the combined neutrino flux can still be used for tuning since both sources contribute differently at different energies. On the other hand, the atmospheric and extraterrestrial neutrino fluxes are expected to present different features and shape, and the correlations with sources could become a powerful tool to distinguish them. This way, the perspective is open that each of these two fluxes could be separately used for tuning purposes.

In summary, there is a wealth of data from astroparticle experiments available for tuning. The experimental precision achieved in modern air-shower experiments is competitive with accelerator-based experiments and challenges event generators tuned to LHC measurements. To disentangle the complex dependencies between the microscopic properties of hadronic interactions and the macroscopic observables in these experiments, several observables should be used together in the tuning, while carefully considering the systematic uncertainties and correlations in the measurements. Hybrid experiments such as the Pierre Auger Observatory and IceCube, which can measure several observables simultaneously, offer the greatest value for tuning. A detailed discussion of the most relevant astroparticle experiments and their recent measurements in the context of event generator tuning is given in Appendix C.

4.3 Clash of high-energy accelerator and astroparticle physics: the muon puzzle

A prominent example that illustrates the need for a coordinated effort by both the particle physics and the astroparticle physics communities is the so-called muon puzzle in air shower data [3]. In the last decade an increasing number of datasets have revealed a consistent systematic discrepancy between the number of muons observed in EAS with respect to those predicted by standard interaction models. This gap persists despite data from the LHC having been included in the tuning of the hadronic interaction models.

Apart from the cosmic ray energy $E$ , the observed muon density at ground level depends on the atmospheric depth of the ground array, the lateral distance at which the muon density is measured, the zenith angle of the showers considered, and the effective energy cutoff introduced by the shielding of the detectors. Due to the diversity of the measurements the Working Group for Hadronic Interactions and Shower Physics (WHISP) consisting of representatives from most of the experiments was formed [69, 70, 71, 68].

To facilitate the comparison of observed vs. expected muon numbers for the very diverse experimental conditions, the group defined an abstract $\Delta z$ -scale. The $\Delta z$ -values are proportional to the difference between the observed number of muons in an experiment and the expectation calculated with an event generator and a model for the primary flux. An overview of all the muon measurements is presented in Appendix C.8.

At first glance, a globally coherent picture from 1 PeV to 10 EeV does not emerge. However, two groups of experiments can be identified. Experiments that determine the shower energy largely or almost independently of the muon number, such as IceCube and Auger, show a muon deficit in the simulations that grows at a constant rate with increasing energy (see Figure 2). On the other hand, for experiments without independent energy measurements, no clear picture emerges (see Appendix C.8). presumably because the shower energy is reconstructed (at least in part) from the number of muons, thereby masking the deficit.

In addition to the scientific interest in the nature of hadronic multiparticle production at high energies, the muon puzzle unfortunately introduces large systematic uncertainties in the analysis and interpretation of EAS data. This is particularly critical when training machine learning algorithms with simulations that are not fully consistent with the data.

5 Tuning of event generators

Event generators use effective models with tens to hundreds of parameters that need to be adjusted to experimental data, in a process called tuning. A tune refers to one given set of these parameters. Many event generators have switches to enable or disable physics processes or select alternative physics models, which are also stored in the tune. Depending on the enabled processes, a tune can be more conservative or more speculative.

A prerequisite for tuning is the availability of experimental measurements available in a machine-readable format. In the particle physics community, this is provided by the High-Energy Physics Database (HEPData) [72], an open-access repository for sharing data from experimental particle physics. A project with a similar goal in the astroparticle community is the Cosmic Ray Database (CRDB) [73]. Then, one has to compare the output of an event generator to the measurement, in order to compute a residual. This step is not trivial due to the different internal implementations chosen by each event generator and the diversity of measurements.

Event generators suitable for tuning provide a documented interface that allows one to change parameters without recompiling the code. Pythia 8 is exemplary in this regard. Other event generators, including most of those used in the astroparticle physics community, are not designed to allow for tuning from a generic user. The parameters are hard-coded or the tuning interface is not documented. Tuning may be done manually by changing parameters until the generator predictions match the experimental results in the control plots. Alternatively, automatic tuning by performing a multi-dimensional fit makes results more reproducible and eases the incorporation of new data.

5.1 Automatic tuning

In automatic tuning, a suitable distance measure, such as a least-squares-type cost function, is used to quantify the agreement of the event generator output with a set of measurements, taking into account experimental uncertainties. The best set of parameters is found either by minimizing the least-squares-type cost function via gradient descent, or in a Bayesian framework by computing the posterior probability density of the parameters from the likelihood function and priors. Further details on these methods are presented in Appendix D.

Before tuning, the raw output of an event generator must be converted into a prediction that can be compared to the measurement. In particle physics this step is performed by the Rivet [74] software which has been specifically designed to support the development, validation and tuning of event generators. The experimental data are entered via so-called Rivet plugins which are programs that emulate the published experimental analysis but starting from the generator output, instead of the original experimental data. Rivet stores measurements in a human readable format, usually imported from HEPData. Some event generators, such as Pythia 8, can feed the raw events directly into Rivet, while others use interface programs [75, 28, 29].

The classic tuning loop used in particle physics is illustrated in Figure 3, where a tuning algorithm iteratively optimizes the event generator to describe the HEP data. Note that the figure is conceptual only; in practice, tuning adds a few complications to this basic picture. First, one must assign weights to each observable in the calculation of the least-squares-type cost function, because event generators remain incomplete models of nature and usually do not perfectly match measurements to the extent of their implemented physics processes. Some observables are known much more precisely than others, and possibly more precisely than the corresponding theory. This can drag a fit without weights into an implausible parameter space. Introducing weights or imposing additional common model uncertainties is a ad hoc way of balancing experimental-versus-theoretical error to avoid this problem. Second, computing the prediction of the event generator is expensive, so it is more economical to construct a surrogate model (in the simplest case, a multidimensional polynomial) from the output of the actual event generator, which is used to compute the chi-square instead of the real event generator. Depending on the tuning algorithm, an analytical surrogate model may be needed so compute gradients.

Examples of automatically tuned models are Sibyll and Pythia. While Pythia has many tunes, often several per LHC experiment, for example the Monash tune [76], for Sibyll automatic tuning was used to determine the best parameters [13].

Using an interface like Rivet simplifies tuning, but expert knowledge is still needed to select the right measurements that are sensitive to a given range of tuning parameters, to avoid measurements that the event generator was not designed to describe, and to choose between measurements that contradict each other. To facilitate this process, the MCPlots [77, 78] website has been created. It is based on Rivet and provides comparisons between the established particle physics event generators in many code bases and tunes whose results can be compared to a large set of measurements (Rivet plugins).The website assembles plots of pre-generated results produced on the computers of citizen scientists who contribute to the project via the LHC@home initiative. In addition to the visual comparison of generator predictions with measurements, the website also computes a chi-square test statistic for the goodness-of-fit between the experimental data and an event generator version or tune, allowing non-generator experts to easily select the appropriate tune for their task.

5.2 Early tuning efforts for astroparticle physics

One of the major shortcomings of all current HEP tunes of Pythia 8 is the difficult description of particle collisions at large rapidities, most notably the failure to describe the spectra of neutrons and neutral pions measured at the LHCf experiment [48]. Specifically, the $\pi^{0}$ spectra predicted by Pythia 8 are too hard while the neutron spectra are too soft [79, 80, 81, 82]. It is worth noting that the event generators used to simulate air showers, such as EPOS LHC, QGSJet, and Sibyll also do not perform well. This is very important for simulating air showers, as well as for physics in the forward direction such as the FASER [83] or SND@LHC [84] experiments.

Starting from different configurations and tunes of Pythia 8 [85, 76], a good description of the LHCf data was obtained by adjusting the hadronization model and the parameters of the so-called beam remnants using Rivet and the Apprentice toolkit [82]. Building on the successful “forward tune” of Pythia 8, efforts are underway to produce a first “air shower” tune. The aim is to better describe the processes important for air shower simulations, namely the forward hadron production measured e.g. by LHCb and NA61 [80, 81, 86, 79, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101], as well as the total and inelastic cross sections in different collision systems [102, 103, 104, 105, 106, 107], spanning several orders of magnitude in energy (for the full set of measurements, see Table 5 in Appendix D). To achieve this goal, the tuning parameters have to be determined step by step for each collision system, starting with p+p, then $\pi^{\pm}$ +p/K±+p, followed by p+ $A$ and $\pi+A$ , and ending with $A+A$ . The nucleus $A$ should ideally be of the CNO group, since nitrogen is the most abundant nucleus in the atmosphere, and intermediate-mass nuclei are abundant in the UHECR flux near the spectral cutoff. Therefore, the upcoming proton-oxygen runs at the LHC will provide an important validation data set and future reference point at the TeV scale for this and subsequent tunings. Parameters related to beam remnants and hadronization are highly relevant for the tuning, as well as the dependence of the parameters on the CM energy of the collision system, since this has to be extrapolated to higher projectile energies. It will also be important to arrive at a tune that describes the mid- and forward-rapidity data without major inconsistencies.

An essential first step towards global tuning to particle and astroparticle data is to demonstrate the feasibility of tuning to air shower data. To this end, a study based on the event generator Pythia 8 and the air shower simulation code Corsika 8 [108] was performed by generating mock air shower data using the default settings in Pythia 8. Using Bayesian tuning [109, 110] these default values were successfully recovered by tuning Pythia to $X_{\text{max}}$ and $N_{\mu}$ in the mock data. This study shows that tuning to air shower observables is in principle possible, and that the Bayesian tuning approach is a promising method for global tuning to particle and astroparticle data. The experiment also highlights the need for fast air shower simulations. The next step is to perform a first tune to data.

6 Towards global tuning

In this section, we outline a vision for simultaneous, automatic tuning of MC generators with data from particle and astroparticle physics. This requires a framework to be built in collaboration with the developers of the event generators and both experimental communities. Many of the necessary ingredients are already available or under development. First studies in this direction have been made with encouraging results [82, 111].

The basic training loop as envisioned for a global tuning is illustrated in Figure 4. The particle physics tuning loop as discussed in Figure 3 is complemented by a second iterative loop where the input from the particle physics event generator and the cosmic ray flux model are combined in the air shower simulation code to derive the prediction for the observables that is to be compared with the air shower data. The task of the Rivet-like translator is to choose the appropriate configuration for the air shower simulation for each observable. This includes selecting the method of calculation, which can be a full MC simulation (Corsika) or solving cascade equations (Conex, MCEq).

The data sets from astroparticle and accelerator based measurements are largely complementary, many EAS observables are known to better than 10%, which together with the lever arm in phase space provides significant constraints on event generators. This is also demonstrated by the fact that some EAS observables are difficult to describe by current event generators.

6.1 Current challenges in describing EAS data

The consistent description of EAS data is a challenge for current event generators. They show a muon deficit compared to measurements [65, 66]. The muon production depth $X_{\mu,\text{max}}$ is also not reproduced by all event generators: EPOS LHC predicts an excessive depth in simulations, while QGSJet-II.04 is consistent [112].

Strangeness enhancement

The influence of enhanced strangeness production on muon production in air showers has been studied in high multiplicity events with a core-corona model, where the core is responsible for the strangeness enhancement [44]. The studies were based on a modified version of EPOS LHC. Using the $\Omega/\pi$ ratio measured by ALICE as a function of hadron multiplicity as a reference and assuming that strangeness production for all collision systems depends only on the hadron multiplicity, the LHC results were extrapolated with the core-corona model to ultra-high energy air showers. It was found that this leads to an increase in the muon number of about 10% , which is not sufficient to resolve the observed muon deficit in air shower simulations.

Standard Model uncertainties

The impact of Standard Model uncertainties on the predictions of the muon number $N_{\mu}$ and the muon production depth $X_{\mu,\text{max}}$ has been studied in [113]. The EAS observables were computed with Conex and the event generators QGSJet-II-04, EPOS LHC, Sibyll- 2.3, and a modified variant of QGSJet-III. Modifications of the latter were made within experimental constraints from collider data. The influence of the following modifications was studied: a change in the pion energy distribution, an enhanced gluon content of the pion, a modified energy dependence of pion exchange, and a possible enhancement of (anti)nucleon, kaon and meson production. It was found that all these variations cannot increase $N_{\mu}$ by more than 10%, which is not enough to agree with the data from the Pierre Auger Observatory. Moreover, at the same time they further increase $X_{\mu,\text{max}}$ which aggravates the tension with the Pierre Auger Observatory observations.

Sibyll★

A similar study was performed with a modified version of Sibyll- 2.3d, called Sibyll*★* [114], with ad hoc modifications of the final state ignoring internal model consistency. Simulations were performed to study different scenarios for increasing muon production in air showers: increased baryon (and hyperon) production, increased $\rho^{0}$ production, increased K± production, and a mixture of baryon and $\rho^{0}$ production. It was found that all scenarios can match the Auger data, but the scenarios with increased $\rho^{0}$ do so without introducing sudden jumps in the production cross section as a function of the collision energy. However, the forward/high $x_{F}$ $\rho^{0}$ production cross section is expected to decrease with increasing energy, while it remains constant or even increases in the modified scenarios that match Auger data.

Strangeball model

The muon production in air showers can be increased with a so-called strangeball model [115], an evolution of the fireball model of [116]. The latter assumes that a quark-gluon plasma droplet (the fireball) is formed in a fraction of the collisions. The fireball and core-corona models are similar, but in the latter both the core (fireball) and the corona (string fragmentation) contribute simultaneously, whereas a fireball is formed only with some probability but then is the sole product of the collision. The authors found that the fireball model cannot consistently describe the mean and variance of $X_{\text{max}}$ in air showers due to the inelasticity enhancement associated with the formation of a plasma state. This problem does not arise in core-corona models, where the core component does not extend to very forward rapidities, so that its effect on inelasticity is reduced. The strangeball model solves this problem by increasing only the strangeness produced in Standard Model hadronic interactions relative to other flavors, without forming a quark gluon plasma state. Strangeball parameters have been found that are consistent with the muon production seen in Auger data. Constraints from measured shower-to-shower fluctuations of the muon number require strangeness enhancements already at the TeV scale. A comparison with relevant measurements from the LHCf and LHCb detectors does not directly exclude this scenario. Further LHCb measurements in Run 3 and Run 4 at the LHC could further constrain this model.

Global tuning may reveal that current event generators are not able to consistently describe all the data, and that our current modeling of non-perturbative QCD is incomplete. This would be a success of global tuning, as it would help to guide us towards the appropriate extensions of the standard theory.

6.2 A RIVET-like translator for astroparticle data

Global tuning is necessarily an automatic tuning that adjusts many parameters in the event generator to many data sets simultaneously. When tuning only to HEP data, it is possible to select data sets that correspond to only a small subset of parameters in the event generator and perform a manual tuning. This approach is not feasible when including EAS data which are sensitive to many correlated parameters. For EAS measurements, a translator software analogous to Rivet discussed in Subsection 5.1, or an extension of the Rivet software is needed to allow global tuning. It seems obvious that this effort should build on the existing infrastructure developed within the HEP community.

A translator for EAS data can work in the same way as a Rivet plugin, but with simulated air showers as input. For each analysis, one defines the relevant energy range of air showers to be simulated, the atmospheric profile, the zenith angle range, and the observation level. Air showers are then simulated according to these specifications. The translator applies any selections that bias the air-shower sample, such as cuts on the depth of shower maximum $X_{\text{max}}$ that correspond to the field of view of the fluorescence telescopes.

The existing Rivet software can be extended to become this new translator, but this is not a trivial task. Several arguments speak in favor of this approach: The internal Rivet file format is flexible enough to describe EAS data as well. Tuning software can already interface with Rivet plugins, integrating EAS plugins would be relatively easy, and the MCPlots website could be used to show comparisons with EAS data. The main problem is that Rivet is designed for input in the form of a particle graph, which cannot represent air showers. An air shower consists of longitudinal density profiles for electrons, photons, and muons, and a list of millions of particles that reach the observation level. The HepMC format is not designed to represent density profiles, nor to store millions of particles. Handling this input would likely require deep changes to Rivet, which would need to be discussed with the Rivet developers to evaluate the feasibility of such changes.

The alternative is to construct a new Rivet-like translator from scratch. For this approach, tuning and plotting software would have to be adapted to interface with this new translator. An advantage would be the decoupling from the Rivet release cycle, and the possibility to write the translator in Python instead of C++ to benefit from the fast development cycle and easy prototyping.

Tuning without a surrogate model

One limitation of established automatic tuning methods is the need to construct a surrogate model. This step adds complexity because the parameter space must be efficiently sampled. Methods based on stochastic gradient descent (SGD), such as Adam [117], could potentially be used to tune the event generator directly, avoiding the construction of the surrogate model, and dramatically reducing the computational cost of tuning many parameters at once.

6.3 Tuning of the cosmic ray flux model

An optional but natural extension of global tuning is the inclusion of cosmic ray flux model parameters, which provides additional benefits. Air shower observables depend on the cosmic ray flux and its composition, which are a priori unknown and must be inferred from these observables. The latest generation of hybrid observatories measure the all-particle cosmic ray flux in a nearly model-independent way, which is now fairly well understood [67]. The remaining uncertainty in the all-particle flux is dominated by the energy scale uncertainty, which has little effect on most air shower observables of interest. Exceptions are the atmospheric muon and neutrino fluxes. The composition, however, carries large uncertainties, and affects several air shower observables.

This seems to introduce a circular dependency: we want to use air shower observables to tune event generators, but the prediction of these observables depends on the composition that we infer by comparing event generator predictions with air shower observables. In reality, this is not a circular dependency, because we have many observables that are sensitive to different aspects of the event generators. The composition is usually inferred from the depth of the shower maximum, $X_{\text{max}}$ , and predictions based on this are then compared with other observables. This is successful, because the theoretical uncertainty in predicting $X_{\text{max}}$ is smaller than for other variables.

An alternative is to include a cosmic ray composition model in the tuning process. The parameters of the flux model are then also adjusted by the tuning algorithm. This universal tuning would infer the composition from all air shower observables, not just from $X_{\text{max}}$ . This approach can only work if the tuned event generator is able to consistently describe all air shower observables. This is not currently the case, and naive universal tuning would lead to incorrect results. However, the approach can still be used if $X_{\text{max}}$ measurements are given a higher weight in the least-squares-type cost function used in the tuning.

In addition to conceptual consistency, the benefit of universal tuning is an additional result: the model of the cosmic ray composition obtained in this way would have an uncertainty band propagated from all air shower and HEP observables used in the tuning. Currently, the uncertainty is estimated from the scatter of results obtained with different event generators. This uncertainty estimate may over- or underestimate the true uncertainty.

7 Summary and next steps

In this article, we demonstrated that a global tuning of event generators with data from both high-energy accelerators (HEP) and extensive air showers (EAS) has the potential to resolve some of the puzzles we currently face in interpreting astrophysical measurements. As an example, we discussed effects that could increase the muon production in air showers to better describe observations. Global tuning will either resolve such tensions or lead us to an extension of the standard theory.

Global tuning needs to be done with automatic tuning software for which the classic tuning loop is extended by including a model of the cosmic ray flux and air shower simulations. To facilitate the use of EAS data, a translator similar to Rivet in HEP data needs to be developed. This report can serve as a starting point for such a development.

Another requirement for global tuning is an event generator with a documented tuning interface that includes all the physical processes needed to simulate hadron-nucleus and nucleus-nucleus collisions up to hundreds of TeV. Pythia 8/Angantyr currently best meets these requirements and is therefore the first target for tuning, but other event generators should follow.

Established methods for tuning are ready to be used for global tuning, as demonstrated in toy studies. One concern is the significant additional computational cost of running air shower simulations. This cost can be significantly reduced by using cascade equation solvers such as Conex and MCEq instead of full Corsika simulations. For tuning of many parameters at once, future methods based on stochastic gradient descent may perform better, but such methods have yet to be developed. In the near future, air shower simulations will also benefit from ongoing efforts to improve the description of forward hadron production in event generators and from data using oxygen beams in the LHC.

In summary, global tuning is within reach. Prototypes and smaller studies have shown that the tuning process works and is useful to address the main challenges in simulating EAS and HEP data. With the combined efforts of the community, we can make global tuning into a standard tool that everyone can use. Global tuning can lead to improvements in tuning methods themselves, benefiting the (astro)particle physics community as a whole. We can expect many interesting results in the near and midterm future.

Acknowledgments

This paper is a comprehensive overview of work that has been advanced with a collaboration of experts during the workshop Tuning of hadronic interaction models111https://indico.uni-wuppertal.de/event/284/ at the Bergische Universität Wuppertal in January 2024. The international workshop was organized as part of the Collaborative Research Center SFB1491, Cosmic Interacting Matters — From Source to Signal. We acknowledge the support of the workshop and the related research by many of the workshop participants and authors of this paper by SFB1491, funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under project number 445052434.

J. Albrecht acknowledges additional support from the Heisenberg Programme of the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under project number AL 1639/5-1, and from the Bundesministerium für Bildung und Forschung (BMBF, Federal Ministry of Education and Research) under grant number 05H21PECL1 within ErUM-FSP T04. H. Dembinski acknowledges funding from the DFG under project number 449728698. Karl-Heinz Kampert acknowledges additional support from the BMBF under grant numbers 05A20PX1 and 05A23PX1 and from DFG under project no. 445990517. G. Sigl acknowledges support from the DFG under Germany’s Excellence Strategy — EXC 2121 Quantum Universe — 390833306, and from the BMBF under grants 05A20GU2 and 05A23GU3. N. Korneeva acknowledges support from the Monash Warwick Alliance as part of the Monash Warwick Alliance in Particle Physics, and from the LHC Physics Centre at CERN (LPCC). S. Ostapchenko acknowledges support from the DFG under project numbers 465275045 and 550225003. F. Riehn has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement number 101065027. T. Sjöstrand has been supported by the Swedish Research Council under contract number 2016-05996. L. Cazon thanks Ministerio de Ciencia e Innovacion / Agencia Estatal de Investigacon (PID2022-140510NB-I00 and RYC2019-027017-I), Xunta de Galicia (CIGUS Network of Research Centers, Consolidacion 2021GRCGI-2033, ED431C-2021/22 and ED431F-2022/15), and the European Union (ERDF). J. Blazek, J. Ebr and J. Vícha have received funding from the following grants: CAS LQ100102401, GACR 21-02226M and MEYS CZ.02.01.01/00/22_008/0004632. P. Paakkinen acknowledges the support from the the Research Council of Finland (projects 330448 and 331545) and as a part of the Center of Excellence in Quark Matter of the Research Council of Finland (project 364194).

Appendix A Details on event generators

A.1 EPOS4

The theoretical basis of EPOS4222Available at https://klaus.pages.in2p3.fr/epos4/ [10, 118, 119, 120] is the so-called parton-based Gribov-Regge field theory (GRFT) [121] with energy conservation at amplitude level and a bare amplitude based on the parton model. The main new development in EPOS4 is a way to accommodate simultaneously: rigorous parallel scattering, energy-momentum sharing, perturbative QCD, and validity of the Abramovsky-Gribov-Kancheli (AGK) theorem [122]. This ensures binary scaling (in $A+A$ scattering) and factorisation (in p+p) for hard processes, by introducing saturation (in a particular way), compatible with recent “low- $x$ -physics” considerations. Energy-momentum sharing is mandatory for a consistent picture and offers a connection between factorisation and saturation.

The validity of AGK means that one can do the same as models based on the QCD factorisation theorem to study inclusive cross sections, and much more. As mentioned in the introduction, many important observables go beyond inclusive cross sections, where one needs access to complete events. The unique feature of EPOS4 is to start from a parallel multiple scattering scenario, while ensuring that it breaks down to the description using PDFs for inclusive cross sections. Other models start with inclusive cross sections and then introduce multiple scattering via the eikonal formula. This is relevant also for nuclear collisions, where the same strategy is used of starting with parallel scatterings without ordering (a unique feature). Doing nuclear scattering in EPOS4 is a natural extension of the p+p approach.

From the initial stage of parallel instantaneous partonic scatterings, a number of pre-hadrons are obtained. In the core-corona approach, the pre-hadrons are separated into “core” and “corona” pre-hadrons, depending on the energy loss of each pre-hadron when traversing the “matter” composed of all the others. Corona pre-hadrons (per definition) can escape and become final hadrons, whereas core pre-hadrons lose all their energy and constitute what we call “core”, which acts as an initial condition for a hydrodynamical evolution of a quark gluon plasma.

The evolution of the core ends when the energy density falls below a critical value and hadrons are formed. EPOS4 uses a new procedure of energy-momentum flow through the “freeze-out hypersurface” defined by the critical energy density value, which allows for defining an effective invariant mass. The latter decays according to microcanonical phase space into hadrons, which are then Lorentz boosted according to the flow velocities computed at the hypersurface. New efficient methods for the microcanonical procedure were developed to make this feasible. Energy-momentum and flavours are conserved in the full scheme for all hadrons from core and corona.

A.2 EPOS LHC-R

EPOS LHC-R333Not yet publicly available. [11] is an updated and re-tuned version of EPOS LHC[49]. As EPOS4, it is based on the parton-based GRFT with energy conservation at amplitude level, but without the new saturation treatment. A custom model for the parton distributions is used, the valence quarks are based on the Glück-Reya-Vogt (GRV) parameterisation [123]. It also uses the core-corona approach. The calculation of the hadronisation in high energy/density environments, such as high-multiplicity events or heavy ion collisions, is implemented in a simplified manner compared to EPOS4. Instead of running a full hydrodynamical evolution of the core, collective flow is calculated from a parameterisation, which is matched to the full simulation. These simplifications speed up the simulation of events, which is an important feature for the computationally expensive EAS simulation, which is dominated by the computation time consumed by the hadronic event generator.

EPOS LHC-R is tuned to newer and more detailed data sets than EPOS LHC, which was released in 2012, when only early LHC data was available. This lead to a lower extrapolated cross section and better pseudorapidity distributions. Some missing physics phenomena have been introduced, such as colour transparency, for a better description of multiplicity fluctuations in nuclear interactions for instance, or a real pion exchange process for very forward neutron production as measured by the LHCf experiment. Special attention is given to the details of the hadronisation chain, in particular on the $\rho$ meson production in string fragmentation and how the energy is shared between the core and corona contributions. EPOS LHC-R is meant to be a transition between EPOS LHC and EPOS4, which is much further evolved and includes new features like saturation effects.

A.3 QGSJET-III

QGSJET-III444Code is available on request. [124, 12] is a recent update of QGSJET-II-04 [125, 126], from which it inherits its principal feature: a microscopic treatment of nonlinear interaction effects, based on all-order resummation of the underlying Pomeron-Pomeron interaction diagrams. The evolution of parton densities is calculated according to the DGLAP equations [127, 128, 129]. The new theoretical mechanism implemented in QGSJET-III is the treatment of higher twist (HT) corrections to hard parton scattering processes, based on a phenomenological extrapolation of the corresponding approach of Qiu and Vitev [130, 131], regarding deep inelastic and proton scattering on heavy nuclei, towards collisions of hadrons with protons and light nuclei. While the magnitude of such HT corrections is strongly constrained by HERA data on deep inelastic electron–proton scattering, such that their impact on the predicted cross sections and particle production yields in p+p and hadron-air collisions is rather moderate, the mechanism does its principal job: taming the rise of inclusive cross sections for (mini)jet production, in the limit of small jet transverse momenta [124].

Besides that, an important technical improvement implemented in QGSJET-III concerns the treatment of the pion exchange process in hadron–proton and hadron–nucleus (nucleus–nucleus) collisions [12]. This mechanism strongly impacts predictions for the muon content of extensive air showers (EAS) initiated by primary cosmic rays [126]. Here the process is described in a theoretically more consistent way, compared with the simplified treatment of QGSJET-II-04, restricted to pion–proton and pion–nucleus collisions only. Further, data on forward neutron production from the NA49 [97] and LHCf [132] experiments have been used to verify the validity of the approach, regarding the energy dependence of the process, over six decades in energy.

Applying the QGSJET-III model to EAS simulations, one obtains rather small differences with respect to QGSJET-II-04 for basic EAS characteristics, like the shower maximum depth $X_{\text{max}}$ or the muon number $N_{\mu}$ at ground level [12]. This may indicate that the treatment of relevant aspects of hadronic interaction physics is already sufficiently constrained by available accelerator data. Such a conclusion is further corroborated by a recent study, where it has been demonstrated that the predicted EAS muon content can not be increased by more than 10% while staying within the standard physics picture (which here excludes core-corona and similar approaches), without entering into a serious contradiction with relevant accelerator data [113].

A.4 Sibyll

Sibyll555Available in Chromo [29] and CRMC [28], Corsika [37, 108, 133], and all air shower codes. Code and source documentation are available upon request.[20, 13, 21] is designed to describe the general features of hadronic multi-particle production, like the leading-particle effect, the formation of high- $p_{\perp}$ jets predicted in QCD, the production of diffractively excited states of the projectile and target, and approximate scaling of leading-particle distributions with interaction energy. Focus is put on those physics aspects that are most relevant for the development of extensive air showers, like energy flow and particle production in the forward phase space region. While the model is kept as simple as possible, the important microscopic physics concepts and the general principles of scattering theory and unitarity are implemented to allow for extrapolation to energies and phase space regions beyond the reach of colliders.

The interaction model in Sibyll is based on the two-component dual parton model with soft and hard minijets [134]. Nuclear collisions are treated with an extended superposition model [25]. It also includes low- and high-mass diffraction and a model for the excitation of beam remnants [135]. Hard scattering is distinguished from soft scattering by an energy-dependent cut-off $p_{\perp 0}$ in transverse momentum. The cross section for hard scattering is calculated to leading order (LO) in QCD at the scale $p_{\perp 0}$ , for soft scattering a parameterisation based on the Regge field theory [136] is used. The energy evolution of parton densities and saturation is effectively included by the increase of $p_{\perp 0}$ with energy. Contributions from quarks of all flavours and gluons are included in the QCD cross section, in the subsequent fragmentation (based on the Lund model) only hadrons containing $(u,d,s)$ and $c$ quarks are produced.

A.5 PYTHIA 8

Pythia 8666Available at https://pythia.org/, a detailed (HTML) manual is distributed with the code. [14] is a general-purpose event generator. At the LHC it is primarily used to simulate soft and hard processes in the central rapidity range of p+p collisions. An MPI (multiparton interaction) is the basic building block of an event. It involves a perturbative $2\to 2$ QCD process, where the $p_{\perp 0}$ parameter is introduced to dampen the low- $p_{\perp}$ divergence. A varying impact parameter, with a non-zero Poissonian number of MPIs at each, builds up the MPI multiplicity spectrum. This is similar to but not equivalent with traditional Gribov–Regge. The $p_{\perp}$ -ordered initial- and final-state showers are added to each MPI.

The Lund string fragmentation model is used to hadronise events. Naively, each MPI would have strings stretching out to the beam remnants, but colour reconnection is applied to reduce the total string length, both to the beam remnant and in the central region itself. Diffractive events are handled as if a Pomeron acts like a glueball [137].

In recent years, Pythia 8 has been extended [15] to enable its use in EAS simulation. PDFs were introduced for more than 20 different hadron species. Total and partial cross sections were introduced for corresponding hadron–nucleon collisions. These were smoothly matched onto the low-energy hadronic rescattering framework [138]. Thus, Pythia can be used almost from threshold energies up to and beyond centre-of-mass energies of 100 TeV, aside from increasing problems with numerical precision. A simple kludge was introduced to handle nuclear targets, tuned to approximately reproduce the Angantyr (see below) results at collider energies. As a non-standard feature, not fully available in the public version, rope hadronisation [139] and shoving [140] can be used to enhance the fraction of strange baryons and to generate asymmetric flow, respectively.

Angantyr [26] is a module of Pythia 8, that extends its functionality from p+p to p+ $A$ and $A+A$ . It gives a special attention to fluctuations in the nucleon wave function. The Good–Walker formalism [22] is used to obtain the relevant cross sections. Glauber modelling is used to find which nucleons collide. Then, all non-diffractive sub-collisions are considered in order of increasing impact parameter. Altogether, a reasonable description is obtained for multiplicity and pseudorapidity distributions. Angantyr should be able to offer a better alternative for cosmic ray evolution than the current kludge, and is being tried out.

A.6 UrQMD

The UrQMD777Available at www.urqmd.org. [141, 142] (Ultra-relativistic Quantum Molecular Dynamics) model is based on the propagation of hadrons according to Hamilton’s equation of motion as derived from the Ritz variational principle. The model allows to obtain the full time evolution of the collision, from the initial state to the final particles. The decoupling of the particles from the system is governed by their individual scatterings. The propagation uses relativistic kinematics and includes, as usual for QMD-type simulations, all $n$ -particle correlations. This is different from test-particle based approaches that solve certain kinds of Boltzmann-type equations and therefore only allow 1-particle distribution functions to be obtained (meaning that they do not propagate particle correlations). UrQMD includes potential interactions of Skyrme type (but other potentials can also be included), and allows using soft and hard potential interactions. The current data suggest that hard potentials, meaning a stiff equation-of-state, is appropriate at low energies ( $\sqrt{s_{NN}}<7$ GeV), while a soft equation-of-state is favoured at higher collision energies [143]. The hadrons do interact according to a 2-particle collision term that may involve elastic interactions and inelastic reactions, where the inelastic reactions include resonance creation and decay, and also string formation and fragmentation. At very high energies ( $\sqrt{s_{NN}}>50$ GeV), the model employs Pythia to include hard scatterings. The table of included hadrons reaches typically up to 3 GeV in mass, depending on particle type. In the latest version also charm degrees of freedom have been included, although charm production has only been benchmarked in the low energy range ( $\sqrt{s_{NN}}<7$ GeV). Generally, the application of such a model set-up leads to a rather natural transition from the central collision area to the periphery of the interaction zone. That is often termed as the core–corona transition, as it leads to substantial thermalization in the central region of the collision due to high interaction rates, and then naturally transforms into single nucleon+nucleon interactions towards the edges of the collision zone. For a recent review of transport models the reader is referred to Ref. [144].

Appendix B Transport codes

B.1 CORSIKA

Corsika is a Monte Carlo code for the simulation of extensive air showers, that was originally developed for the KASCADE experiment but has found wide application in many astroparticle physics experiments [37]. Corsika is a monolithic Fortran code, which exhibits excellent performance but incurs limitations such as limited parallelization possibilities and increasingly difficult maintenance. To address these problems and leverage the possibilities of modern software engineering, since 2018 a rewrite of Corsika in modern C++17 is carried out, with a focus on modularity and the possibilities and needs of modern supercomputing [108, 145]. This new version, called Corsika 8, is now physics-complete and offers unprecedented flexibility in air shower simulations [133], including radio and Cherenkov emission. In comparison to the highly-optimized but more specialised preceding Fortran versions, Corsika 8 is still slower by a factor 3–5, but offers new possibilities such as complete genealogy of air shower particles; showers in different media, e.g. crossing from the atmosphere into ice or rock; multithreaded radio emission calculation; and GPU-parallelised calculation of the Cherenkov emission of air showers.

One of the main applications of Corsika concerns the simulation of EAS and their comparison with observations. In particular, measurements of the muon content of UHECR-induced EAS, notably by the Pierre Auger Observatory, show that hadronic multi-particle production in EAS is not yet fully understood [146, 147]. Inspecting genealogical information (such as generation number, particle species of its preceding generations, etc.) of the particles in EAS simulations can yield valuable insight e.g. into the production mechanism of muons and the gradual decoupling and evolution of the electromagnetic cascade. Moreover, it is possible to make quantitative statements that can be compared with Heitler-Matthews-like toy models [148] that describe EAS observables qualitatively. In particular, the number of hadron generations that precede the decay into a muon has been studied as a function of primary energy, zenith angle and lateral distance, and the muon production depth profiles as a function of hadron generation [149]. Furthermore, the EM profiles as a function of hadron generation have been investigated, and it has been worked out to what extent $X_{\rm max}$ is determined already by the primary interaction alone and which influence later hadron generations exert [43].

B.2 CONEX

Conex [150] is a Fortran code which was designed to realistically simulate the longitudinal air shower development for experiments that observe air showers with fluorescence or air-Cherenkov telescopes. Conex can simulate the longitudinal profile much faster than a full Monte-Carlo approach, and can do so without employing the thinning technique that is mandatory for the simulation of ultra-high energy air showers. It is also commonly used in studies that explore the impact of modified hadronic interactions on air shower observables.

Conex uses a hybrid approach. The first steps of the hadronic cascade, which dominate its shower-to-shower fluctuations, are computed with the Monte-Carlo technique. Afterwards, particles are binned, and the cascade equations solved numerically. That way, Conex can also be used to simulate observables related to shower-to-shower fluctuations of the depth of shower maximum $X_{\text{max}}$ or the muon number $N_{\mu}$ . Conex was later integrated in Corsika 7 [42], which allows one to perform 3D hybrid calculations, where the last stage of the air shower is again simulated with the Monte-Carlo technique. This speeds up air shower simulation by a factor of five compared to the full Monte-Carlo approach with thinning at high energies. In most studies, however, Conex is used in its original mode.

B.3 MCEq

MCEq [151] is an open-source numerical cascade-equation solver written in Python that has been developed and optimized for the calculation of atmospheric lepton fluxes. It is conceptually similar to Conex, but does not use Monte-Carlo simulation for the initial stages of the shower. It allows one to use various hadronic interaction models, among them Sibyll, EPOS LHC, QGSJet and DPMJet. For underground transport the Proposal code has been established. Results obtained with Conex and MCEq are numerically similar, if the same event generators are used. Models generally show a muon flux deficit of 30-35% above a few tens of GeV. Although MCEq is written in Python, its computational speed exceeds that of Conex (a few seconds versus tens of minutes), since the numerical heavy-lifting is done by fast third-party libraries that allow one to exploit the sparsity in the system of equations.

Atmospheric neutrino fluxes are of particular interest for neutrino experiments such as IceCube as they represent a foreground to the astrophysical neutrino fluxes. Atmospheric neutrino fluxes are divided into a conventional component resulting from the decay of pions and kaons while the so-called prompt neutrinos result from the decay of heavy mesons and other hadrons; the largest contribution to the prompt flux is coming from the decay of D mesons. Due to relativistic time dilatation of long-lived light mesons at high energies, the spectrum of the conventional neutrinos is steeper than the primary cosmic ray flux by about $1/E$ , whereas the prompt ones roughly follow the primary fluxes. Because of this difference in spectral slopes, the prompt component starts to dominate the atmospheric lepton flux above 10 PeV.

B.4 CRPropa

In contrast to the other transport codes, CRPropa is designed to model the transport of high-energy particles over galactic, intergalactic, and cosmological distances. The focus is on movement in coherent and turbulent magnetic fields, and photo-nuclear interactions with cosmic photon background fields. The recently released version CRPropa 3.2 [38] is the latest update in a continued effort to maintain and extend this open-source code well known in the cosmic-ray community. Originally aimed at simulating the ballistic propagation and interactions of Ultra-High Energy Cosmic Rays [152, 153], today it can handle diffusive propagation of cosmic rays in a variety of magnetic fields [154], deal with stochastic cosmic ray acceleration, model electromagnetic cascades for gamma ray emission and transport, among other capabilities. It was possible to include diffusive propagation into the existing framework by representing particle densities as pseudo-particles whose trajectories can be simulated like ordinary particles.

It is currently not expected that CRPropa can be used for event generator tuning, but this may change in the future. The latest public CRPropa version does not simulate hadronic interactions of cosmic rays with interstellar gas, so event generators have no impact on current CRPropa predictions. However, work is currently ongoing to implement these interactions either by directly calling event generators or by using precomputed tables [155]. It might then become possible to use neutrino and gamma emissions from strong sources for tuning, as neutrinos are produced by decays of charged pions, while gamma rays are produced by decays of neutral pions.

It is worthwhile to highlight the mathematical and technological approaches used in CRPropa, which partially inspired other transport codes, in particular, Corsika 8. CRPropa is written in a mix of C++ and Python, uses a modern modular design, and features sophisticated parallelization techniques. In CRPropa, each step in the simulation is handled by a module, which implements a physical process. Each module processes a stack of (pseudo)particles at once, which allows one to use vectorization capabilities of modern CPUs and GPUs. CRPropa’s design offers a great deal of flexibility for debugging, prototyping, and experimentation. It is possible to write modules in pure Python or in C++. In most frameworks, the main loop that passes the particle stack from module to module is written in C++ and cannot be directly accessed from the Python layer. In CRPropa, one can replace the standard loop completely with a plain Python loop, which provides a maximum of control for experts and introspection. The high latency of executing Python calls is not an issue in this approach, if enough particles are placed on the stack. Then, the overall computation time is still dominated by the time spent in each module, which makes this approach feasible.

B.5 Z-moment method

Atmospheric lepton fluxes can be computed to good approximation with the semi-analytical Z-moment method (see e.g. [40]). In this approach, the cascade equations are solved for a continuous power-law energy spectrum of cosmic rays under several assumptions, for example, superposition for the projectile (as far as forward production is concerned, a projectile nucleus can be treated like a superposition of its nucleons, each with the same fraction of the total energy). A Z-moment is the result of the integral over the input nucleon spectrum and the (energy-)differential cross section for the production of the desired lepton. While this method introduces approximations which have to be checked against numerical codes, it is very transparent and thus useful to study the impact of uncertainties in cross sections on the atmospheric lepton flux. It cannot be used to simulate monoenergetic air showers, however, since the assumption of a continuous input spectrum is baked into the method.

The theoretical uncertainty of the prompt atmospheric neutrino flux was studied with Z-moments in recent works [156, 157, 158, 159, 160]. The dominant contribution to this flux involves Z-moments over production cross sections for charmed hadrons. In the collinear factorisation framework, these cross sections depend on partonic charm production cross sections, which can be computed in perturbative QCD, and the non-perturbative parton distribution functions (PDF) and fragmentation functions (FF) which in turn depend on the factorization scale. The partonic cross sections depend on the charm quark (pole) mass and the factorisation and renormalisation scales. The idea then is to fit the non-perturbative components of PDFs and FFs to as many accelerator-based measurements as possible to minimise uncertainties when extrapolating to energies and phase space inaccessible to direct experiments but relevant for cosmic ray physics.

The resulting uncertainties of prompt neutrino fluxes are relatively large, up to a factor $\simeq 5$ [158, 159, 160]. At neutrino energies below $\sim 1\,$ PeV in the lab frame, they are dominated by QCD uncertainties, where renormalisation and factorisation scale uncertainties play the largest role. PDF uncertainties increase with neutrino energy due to increasing sensitivity to the gluon PDF in the target at very small momentum fraction, where the gluon density rises rapidly and may experience saturation. Above $\sim 1\,$ PeV, uncertainties in the all-nucleon flux progressively become comparable to the QCD uncertainties. The former derive to a large degree from theoretical uncertainties in the elemental composition of cosmic rays, and to some extent from inconsistencies between air shower measurements.

The composition uncertainties are dominated by theoretical uncertainties in the event generators that are used to interpret air shower data. Therefore, the accuracy of these calculations above 1 PeV will indirectly profit from global tuning of event generators. However, one can also turn this argument around and exploit this sensitivity for tuning, by using the atmospheric muon flux as input, which in contrast to the neutrino flux is purely of atmospheric origin and can be measured by neutrino observatories like IceCube.

Appendix C Details on experiments

C.1 Accelerator: ALICE

ALICE [56] is a general purpose detector for QCD and heavy-ion collision studies at the LHC. It was designed to study strongly interacting matter under high temperature and energy density to investigate the properties of the so-called Quark-Gluon Plasma (QGP). The ALICE detector consists of two main parts. The first component hinges on a central tracking and particle identification system, which exploits energy-loss, transition-radiation, time-of-flight, calorimeter information and more, in order to be able to track and identify hadrons. This system covers the range $|\eta|<0.9$ and transverse momentum from 0.1 to tens of GeV $/c$ . The tracker of ALICE was designed to handle collisions with thousands of particles. In addition to this mid-rapidity region, it has a forward single-arm muon spectrometer covering $-4.0<\eta<-2.5$ , and particle counters that are used to trigger the detector and to measure event activities of the collision (centrality in heavy-ion collisions), covering $-3.7<\eta<-1.7$ , and $2.8<\eta<5.1$ , respectively. A system of zero-degree calorimeters measures protons and neutrons scattered at small angles (with $|\eta|>7.0$ typically).

ALICE in its configuration of LHC run 1 [2009-2013] and run 2 [2015-2018] has already provided unique measurements for the study of QCD and the tuning of event generators at the LHC, thanks to i) its sensitivity down to almost zero transverse momentum, where non-perturbative effects dominate, ii) its coverage of p+p, p+Pb, and Pb+Pb collisions, and iii) the particle identification capabilities. A wealth of these results has recently been summarised in Ref. [161]. Important for this context are the production cross sections for charged particles up to $\eta\sim 5$ , and measurements of the hadron composition in multiple systems and as a function of charged-particle multiplicity. In particular, the enhancement of strangeness production counts among the relevant traits to explain the muon puzzle in air showers.

ALICE discovered multiplicity-dependent strangeness enhancement in p+p and p+Pb collisions, where it was not expected [6]. On the one hand, these measurements showed that production and hadronisation in dense final states is modified, and that the classic assumption of universal fragmentation breaks down. On the other hand, this modification was found to follow at first order the charged-particle multiplicity of the event, i.e. to be largely independent of the collision system, so that a modified form of universality still holds. Not only strangeness is affected, hadron composition generally changes in high-multiplicity events. In the light flavour sector, the productions of stable baryons (protons) as well as short-lived resonances (with lifetime below 10 fm/ $c$ , like $\rho$ and K(892)0) are reduced; the formation of light nuclei (d,t,3He) is enhanced. For these three aspects, this strongly suggests that there exist interactions in the late hadronic stage of the collision, with a significant impact on the collision outcome. In the heavy-flavour sector, open charm production (D0, $\Lambda_{c}^{+}$ , …) is enhanced as well, but here the effect appears as more dependent on the collision system. Despite these effects, the relative composition of light particles ( $\pi$ , $K$ , $p$ ) remains fairly constant at mid-rapidity in average events. The strangeness enhancement for the lightest hadrons with one strange quark (K ${}^{0}_{s}$ , $\Lambda$ ) is mild.

Interestingly for our concerns here, ALICE casted recently a family of studies correlating the energy flux in the ZDC in the very forward region versus the event activity in the mid-rapidity region; the novelty is that detailed studies are carried out in p+p and p+Pb systems. The entry point is a study performed for leading energy in the very forward region versus the multiplicity of unidentified charged particles in the mid-rapidity vicinity [162]; the investigations are further developed versus the strange baryons present at mid-rapidity, as most sensitive particles to strangeness enhancement ( $\Lambda$ , $\Xi$ , $\Omega$ )[163]. At fixed multiplicity in the mid-rapidity region, strangeness enhancement shows a prevailing correlation with effective energy available outside the ZDC. It thus reveals the strong influence of the initial stage of the collision, , to be compared with the impact of the final-state situation.

C.2 Accelerator: LHCb

The LHCb detector [45] is a single-arm general-purpose forward spectrometer at the LHC covering the range $2<\eta<5$ and transverse momentum from 0.1 to tens of GeV $/c$ . It was designed to study the decays of hadrons containing heavy quarks produced in high-energy proton-proton collisions. The detector consists of a high-precision tracking system for charged particles with a high-resolution vertex detector very close to the interaction point to reconstruct decay chains of short-lived hadrons, and several sub-detectors that provide particle identification. These consist of two ring-imaging Cherenkov detectors to separate charged pions, kaons, and protons, a calorimeter system for electron/photon and electron/pion separation, and a muon system. The electromagnetic calorimeter is finely segmented with good energy resolution to allow the study of photons and neutral pion decays. It is the only detector at the LHC with these identification capabilities over its entire acceptance.

LHCb provides unique input for tuning event generators through precision measurements in the forward region. Its measurements of $D$ and $B$ meson production constrain parton distribution functions (PDFs) in the proton and in lead nuclei: LHCb is uniquely able to probe the gluon PDFs down to a momentum fraction of $\sim 10^{-6}$ , which provides strong constraints for simulations of the atmospheric lepton flux observed by neutrino observatories [21, 158]. In regard to soft-QCD, the production cross section for prompt long-lived charged particles has been recently measured in p+p and p+Pb collisions [92, 91] to an accuracy of a few percent, and neutral pion production has been measured in p+p and p+Pb collisions [164]. Ratios of pions, kaons, and protons have been studied in p+p collisions up to 7 TeV [165], and an ongoing analysis studies these ratios in p+p at 13 TeV and in p+Pb at 8.16 TeV. Following the discovery of multiplicity-dependent strangeness enhancement in p+p collisions by ALICE, LHCb found evidence for a multiplicity-dependent rise in the $B_{s}^{0}/B^{0}$ ratio [166], and observed a rise in the $D_{s}^{+}/D^{+}$ [167] ratio.

The LHCb experiment is in the unique position that it can take data in both collider mode and fixed-target mode. This is possible due to LHCb’s fixed-target system called SMOG (System for Measuring Overlap with Gas) [168]. The SMOG system was originally designed for precise luminosity calibration [169] of colliding proton beams. By injecting small amounts of noble gases directly into the primary LHC vacuum around the vertex detector, interactions of the LHC beams with gas can be studied in collisions with center-of-mass energies up to 113 GeV, the highest achieved so far in a fixed-target experiment. This was used to measure the anti-proton production cross section in proton-helium collisions [170, 171], and charm production in proton-neon and lead-neon collisions [172, 173], which constrain a potential intrinsic charm component in the proton with implications for the prompt atmospheric lepton flux.

In recent years, the LHCb detector underwent a major upgrade in preparation for the current data taking period [174], including a new SMOG2 system [59]. Gas is now injected into an open storage cell located upstream of the LHCb collision point, and suppressed outside it by vacuum pumps. The new system allows one to study collisions with non-noble gases as targets, including hydrogen and deuterium, and possibly oxygen and nitrogen, and at higher gas pressures, increasing the luminosity by two orders of magnitude compared to before [175, 59]. Especially proton-oxygen and proton-nitrogen collisions will be of great interest for air showers. The increased luminosity will also make precision measurements of $D$ and $B$ meson production possible. During the planned run with oxygen beam in 2025, collisions on a hydrogen target will give access to proton-oxygen collisions in the forward hemisphere.

C.3 Accelerator: LHCf

LHC forward (LHCf) is a forward experiment at LHC [48] designed to study production of energetic neutral particles, such as $\gamma$ , $\pi^{0}$ and neutrons, emitted in the very forward region at pseudorapidity $\eta>8.4$ , which significantly contribute to the air shower development induced by high-energy cosmic rays. The LHCf experiment has two independent subdetectors, called Arm1 and Arm2, which were installed in the instrumental slots of the TANs (Neutral Target Absorbers) located at a distance of $\pm 140$ m from the ATLAS interaction point. Each LHCf detector is composed of two sampling and position-sensitive calorimeter towers. Above the design threshold of 100 GeV, the energy resolution is better than 5% for photons and 40% for neutrons.

Since the LHC start, the LHCf experiment has carried out a number of operations at various collision energies from 0.9 to 13.6 TeV with p+p and p+Pb. The measured inclusive differential cross sections of $\gamma$ , $\pi^{0}$ and neutron [176, 80, 81, 79, 132, 86], mainly originating from the fragment of protons, were used to test and tune the hadronic interaction models. The most recent operation in 2022 with p+p collisions at $\sqrt{s}=13.6$ TeV focused on an increase of statistics to study strange hadron production using measurements of $\eta$ $(\eta\rightarrow 2\gamma)$ , $K^{0}_{s}$ $(K^{0}_{s}\rightarrow 2\pi^{0}\rightarrow 4\gamma)$ , and $\Lambda$ $(\Lambda\rightarrow n+\pi\rightarrow n+2\gamma)$ , which is important to solve the muon puzzle of high-energy cosmic-ray observations, since strangeness enhancement in the forward region would increase the muon yield in air shower simulations.

Furthermore, details of forward particle production such as diffractive processes are studied in joint analyses with the ATLAS experiment. Requiring no particle detection in the ATLAS inner tracker covering the pseudo-rapidity range of $|\eta|\,<\,2.5$ , LHCf events originating from low-mass diffractive collisions are easily selected [177]. In the 2022 operation, a common operation with ATLAS including the ATLAS zero-degree calorimeter (ZDC) and roman-pot detectors (AFP and ALFA) was also performed. Combining data of these ATLAS forward detectors, various physics processes, $p-\pi$ interaction via one-pion-exchange process and low-mass resonance production [178], can be studied. These results will be important inputs to tune hadronic interaction models.

C.4 Accelerator: NA61/SHINE

NA61/SHINE (SPS Heavy Ion and Neutrino Experiment) [53] is a multipurpose fixed-target experiment designed to study hadron production in hadron-nucleus and nucleus-nucleus collisions at the CERN Super Proton Synchrotron (SPS). The core component of the detector comprises a set of large-acceptance Time Projection Chambers (TPCs) and two superconducting magnets with a combined bending power of 9 Tm. This setup enables precise measurement of particle momenta ( $\sigma(p)/p^{2}\approx(0.3{-}7){\times}10^{-4}\,\text{GeV}/c$ ) and provides excellent particle identification capabilities via the specific energy loss in the TPC volumes.

The experiment commenced operations in 2007 and has since gathered hadroproduction data using a variety of projectiles, beam energies, and target materials. NA61/SHINE conducted detailed measurements of particle production in p+C interactions at 31 GeV/c [179, 180, 181] and $\pi^{+}$ +C at 60 GeV/c to determine the beam properties in accelerator-based neutrino experiments [182]. Since carbon is a good proxy for interactions on nitrogen in air, the measured spectra in these reactions are also highly relevant for modelling low-energy interactions in air showers. Of particular interest for air shower physics are the measurements of particle production in interactions of negatively charged pions with carbon at 158 and 350 GeV/c [99, 101]. The spectra of $\pi^{\pm}$ , K±, p, $\bar{p}$ , $\rho^{0}$ , $\omega$ , K*∗0*, K ${}^{0}_{\text{S}}$ , $\Lambda$ , $\bar{\Lambda}$ were measured in a wide range of longitudinal and transverse momentum.

Furthermore, p+p interactions were studied in a wide range of beam momenta (20, 31, 40, 80 and 158 GeV/c) [183, 184] to establish a reference data set for the heavy ion physics program of NA61/SHINE. This data set is also instrumental in studying the secondary production of anti-protons and anti-nuclei in collisions of cosmic-ray protons with the interstellar medium in the Galaxy.

Finally, following a successful pilot run in 2018 [185, 186], the collaboration plans to take a substantial data set on the fragmentation cross sections of nuclei. These data are essential to interpret the recent high-precision cosmic-ray data on secondary Galactic nuclei, see e.g. Ref. [187, 188] and will also help refine the fragmentation models of hadronic interaction models, thereby improving the modelling of air shower fluctuations.

C.5 Astroparticle: Pierre Auger Observatory

The Pierre Auger Observatory [189] is located in Malargüe, Argentina, at an atmospheric depth of $880\,\rm{g/cm}^{2}$ . It is the world’s largest observatory for the detection of ultra-high-energy cosmic rays and has been operating successfully for almost 20 years. The hybrid design of the Observatory consists of a Surface Detector (SD) array of 1660 water Cherenkov tanks arranged in a triangular grid with a baseline of 1.5 km, covering an area of $3000\,\text{km}^{2}$ . It is complemented by the Fluorescence Detector (FD) [190], which consists of four telescope sites, each with 6 telescopes, at the periphery of the Observatory overlooking the SD at elevation angles from $0^{\circ}$ to $30^{\circ}$ . Another set of three telescopes (High Elevation Telescopes; HEAT) extends the elevation range to $60^{\circ}$ above the horizon. Buried muon detectors made of plastic scintillator slabs provide additional clean information about air-shower muons.

A fraction of the events (hybrid events) are observed in both SD and FD. It has been shown that the combination of signals from surface and fluorescence detectors is a powerful tool to examine current models of hadronic interactions [147].Five two-dimensional distributions, corresponding to different cosmic ray mass hypotheses, of ground signal at 1000 m and depth of shower maximum ( $X_{\text{max}}$ ) were simultaneously fitted with Monte-Carlo templates. Surprisingly, the predicted $X_{\text{max}}$ scale for the models EPOS-LHC, QGSJET-II-04 and Sibyll-2.3d describes Auger data better assuming a depth in the atmosphere which is deeper by about 20 to 50 g/cm2, obtaining a mixed mass composition of primary particles that is heavier than for unmodified $X_{\text{max}}$ scales of the models. The predicted hadronic signal should be increased by about 15-25%, which alleviates the muon puzzle for unmodified $X_{\text{max}}$ scales. The significance of improvement in the description of measured data with the assumed modifications of model predictions has been found to be more than 5 $\sigma$ for any linear combination of experimental systematic uncertainties. Interestingly, from the zenith-angle dependence of the fitted hadronic scale, there is a strong indication ( $4.4\sigma$ ) of too hard energy spectra of muons predicted by QGSJET-II-04. In Ref. [43] the impact of modifications of basic parameters of hadronic interactions has been studied in detail using the 1-D simulation package Conex, in particular, multiplicity, elasticity, and cross section. It is possible to apply the same approach to study general 3-D observables, applying different ranges and thresholds to the modifications of individual parameters based on existing experimental constraints and allow the modification of all three parameters at once [191, 192, 193]. Due to the general anti-correlation between the change of the muon signal at 1000 m, $R_{\mu}(1000)$ , and $X_{\mathrm{max}}$ , only a very specific combination of maximal considered modifications (decreased cross section, and increased elasticity and multiplicity) fits the recent Auger results in the $R_{\mu}(1000)$ – $X_{\mathrm{max}}$ plane [147]. Such modifications are, however, in tension with the Auger measurements of the proton-air cross section [64] and possibly with the higher moments of the observed $X_{\mathrm{max}}$ distributions.

Now in its second phase of operation, the Pierre Auger Observatory is undergoing a major upgrade, known as AugerPrime [194], to increase its sensitivity to the primary mass. The Underground Muon Detector (UMD) will be installed in the low-energy extension of the surface detector as part of the upgrade. It consists of a \qty30m^2 array of plastic scintillator muon counters buried \qty2.3m underground near the water-Cherenkov detectors of a nested infill array. UMD will provide a direct measurement of the muon component of air showers in the energy range \qty3e16eV to \qty1e19eV. This will contribute significantly to the discrimination of the primary mass and to the testing of hadronic interaction models. The soil above each UMD detector is responsible for absorbing the electromagnetic component of air showers, and imposes an energy cut of about \qty1GeV for vertical muons. The muon counts are used to fit a lateral distribution function and the resulting muon density at a reference distance of \qty450m used for comparisons with simulated showers. Preliminary data of a subset of stations equipped with PMTs were used to quantify a muon deficit of the order 35-50% in comparison to data of corresponding X ${}_{\text{max}}$ measurements. Future measurements of the entire infilled area of about \qty24m^2, exploiting SiPM sensors [195] to replace the former PMTs, will provide a high-statistics measurement of the muon content and will allow calibrating muon estimates of the upgraded surface stations.

As part of the AugerPrime upgrade, the surface detector stations of the Pierre Auger Observatory were equipped each with a dual-polarized Short Aperiodic Loaded Loop Antenna (SALLA) measuring the radio signals from EAS in the 30–80MHz band. This extension allows measuring the energy in the electromagnetic cascade of inclined air showers with zenith angles above around $65^{\circ}$ with a resolution of below 10% [196]. Combining the measurement of this electromagnetic energy with the pure measurement of the muon content of inclined air showers by the water-Cherenkov detectors will enable precise studies of the number of muons and their fluctuations at energies beyond $10^{18.4}$ eV, with much larger statistics than could be achieved previously through a combination of water-Cherenkov and fluorescence detector data. Furthermore, the radio detector will allow to cross-check the energy scale of cosmic-ray measurements with an independent approach based on the first-principle classical electrodynamics calculation of radio signals from EAS [197].

C.6 Astroparticle: IceCube Neutrino Observatory

The IceCube Neutrino Observatory [198] (IceCube) is a cubic-kilometer detector deployed deep in the ice at the geographic South Pole at depths between $1450$ m and $2450$ m, accompanied by a surface detector array, IceTop [199]. The in-ice detector measures high-energy muons above a few $100$ GeV from EAS that penetrate the Antarctic ice, as well as charged secondaries induced by neutrino interactions. In addition, IceTop [199] measures EAS from cosmic-ray interactions from PeV to EeV energies at an atmospheric depth of about $690\,\rm{g/cm}^{2}$ . This hybrid detector setup provides unique opportunities to study hadronic interactions in EAS in great detail. New surface detectors for IceTop [200] and IceCube-Gen2 [201] are under development, including scintillator detectors and radio antennas, which will further enhance the capability to study hadronic interaction models in the future.

IceTop has recently reported a measurement of the muon densities in EAS as functions of energy at reference distances of 600 m and 800 m for primary energies between 2.5 PeV and 40 PeV and between 9 PeV and 120 PeV, respectively [202]. These measurements are consistent within uncertainties with predictions using the pre-LHC model Sibyll-2.1. However, comparisons to simulations using the post-LHC models EPOS LHC and QGSJet-II-04 yield higher muon densities than observed. Interestingly, preliminary results of a recent measurement of the multiplicity of TeV muons in EAS, measured in the deep ice with IceCube [203], show agreement within uncertainties with all hadronic interaction models considered. These two measurements therefore indicate inconsistencies in the modelling of GeV and TeV muons within the post-LHC models. To further study these inconsistencies, efforts for a simultaneous measurement of GeV and TeV muons from the same air shower on an event-by-event basis are ongoing [204, 205]. These measurements will yield important information about the energy sharing between low-energy and high-energy interactions during the EAS development and provide strong constraints for hadronic interaction models [13].

IceCube’s deep-ice detector also allows for measurements of the atmospheric neutrino spectrum [206, 207, 208, 209], as well as the muon energy spectrum at very high energies (above a few $10$ TeV) [210, 211, 212]. The combination of these two channels provides valuable input to hadronic interaction models, such as the kaon/pion ratio, or the contribution from the prompt atmospheric lepton flux [21]. Current efforts are underway to update these measurements and to utilize their synergies in constraining hadronic interaction models to measure the relative contributions from the decay of unflavoured and charmed mesons, for example. In addition, measurements of the seasonal variations of the high-energy muon [213, 214] and neutrino fluxes [215] also yield information on the kaon/pion ratio [216, 217] to further constrain hadronic interaction models.

C.7 Astroparticle: KASCADE

The cosmic-ray air-shower experiments KASCADE [218] and KASCADE-Grande [219] were located at the Karlsruhe Institute of Technology, Germany, at an atmospheric depth of $1022\,\rm{g/cm}^{2}$ . These detector arrays measured the energy spectrum and mass composition of cosmic rays in the primary energy range of PeV to EeV [220, 221, 222]. The results have been compared to predictions from different hadronic interaction models [223]. In general, agreement between the models for the all-particle cosmic ray flux are better than for its composition. Interestingly, post-LHC models show a lower all-particle flux than the pre-LHC models at energies of around PeV.

The special configuration of the experiments allows one to combine the KASCADE and KASCADE-Grande data with the possibility to investigate and compare the muon content close to the shower core with those farther away of the core [224]. A combined analysis shows a systematic deficiency for the hadronic interaction models QGSJet-II-04, EPOS LHC, and Sibyll-2.3 to describe the muon content of the showers consistently. Either there are too few muons predicted in the center of the shower or too few at larger distances. This is true for all models considered in the analysis.

In addition, an analysis was performed to estimate the muon content in cosmic-ray induced air showers as a function of the primary energy [225]. These measurements were compared to predictions of the event generators QGSJet-II-04, EPOS LHC, and Sibyll-2.3, which were not able to consistently describe the KASCADE-Grande data for all zenith angles and energies. This indicates that the attenuation of the number of muons with the zenith angle is smaller in data than in simulations. The observed anomalies could imply that the energy spectrum of muons from real EAS at production site for a given primary energy is harder than the respective model predictions. Further investigations are ongoing, even after the completion of the measurements, using data that is available through an open portal, the KASCADE Cosmic Ray Data Centre (KCDC) [226].

C.8 The muon puzzle in EAS

To further investigate the muon deficit reported by Auger and other experiments, the Working Group for Hadronic Interactions and Shower Physics (WHISP) [69, 70, 71, 68] has performed a meta-analysis of muon number measurements from different air-shower observatories. This group is formed by members of the EAS-MSU, IceCube, KASCADE-Grande, NEVOD-DECOR, Pierre Auger, SUGAR, Telescope Array (TA), and Yakutsk EAS Array collaborations. The group combined published data from a diverse set of muon detection methods including ice-Cherenkov stations of IceTop [202], shielded scintillating detectors of KASCADE-Grande [227], underground scintillation detectors of the Yakutsk array [228], the underground Geiger-Mueller counters of EAS-MSU [229], the tracking detector and water-Cherenkov calorimeter of NEVOD-DECOR [230, 231], the underground liquid-scintillator tanks of SUGAR [232, 233], the buried scintillator counters of HiRes-MIA [234], the scintillator modules of TA [235], the water-Cherenkov array of Auger [65, 146, 66], and the underground scintillator modules of Auger [236], as well as the shielded scintillator array of AGASA [237], alongside previously unconsidered data from Haverah Park [238] and new estimates from an analysis of KASCADE-Grande data [239] that uses the energy scale of the Pierre Auger Observatory for calibration. The diversity of these measurements prevents a direct comparison. Apart from the cosmic ray energy $E$ , the observed muon density at ground level depends on the atmospheric depth of the ground array, the lateral distance at which the muon density is measured, the zenith angle of the showers considered, and the effective energy cutoff introduced by shielding of detectors. Furthermore, the muon number $N_{\mu}$ is measured in many different ways. The standard way is to use shielded detectors to isolate the muon signal from other particles produced in the air shower, but experiments without such shielded detectors have used other techniques; some used the atmosphere itself as a shield by analysing highly inclined showers, or discriminate muon hits in ground detectors from hits of other particles based on the characteristic energy deposit of muons, which peaks around a value that depends on the muon inclination. The comparison of the experimental data and the model expectations was therefore done using the $z$ scale, which compares the respective measurements with the expectation from simulated air showers. It is defined as

[TABLE]

where $\langle N_{\mu}^{\mathrm{det}}\rangle$ is the mean value of the measured muon density under the specific conditions of the experiment, and $\langle N_{\mu,\mathrm{p}}^{\mathrm{det}}\rangle$ ( $\langle N_{\mu,\mathrm{Fe}}^{\mathrm{det}}\rangle$ ) is the corresponding prediction of the average muon density for proton (iron) showers under the same conditions.

The method of estimating the shower energy $E$ is important for the $z$ -scale, since $N_{\mu}$ is nearly proportional to $E$ . Systematic shifts in the energy scale lead to apparent shifts in the $z$ -values. To address this, the WHISP adjusted the energy scales of the experiments to align them with a common reference spectrum at a given energy [69]. This greatly improved the consistency of the measurements.

Furthermore, correlations between the estimation of the muon density $N_{\mu}$ and the energy $E$ can distort the measurement, and should ideally be negligible. The cosmic ray energy $E$ is ideally estimated by integrating the longitudinal energy loss profile in the atmosphere observed with fluorescence or Cherenkov telescopes. Experiments without telescopes instead use the charged particle density measured in surface detector arrays, which includes a contribution from muons. In the latter case, the energy estimate always shows some degree of cross-contamination from $N_{\mu}$ , introducing positive correlations event-by-event. Finally, some experiments measure only the muon number $N_{\mu}$ and obtain a muon multiplicity distribution. This distribution can be compared with predictions based on the cosmic ray flux from another experiment obtained with the FD technique to infer $z$ -values (sometimes referred to as intensity based $z$ -values).

In addition to an uncorrelated energy estimation method – ideal for a muon measurement with small systematic uncertainties – a full detector simulation is required to account for potential detector biases and a small atmospheric overburden. The large atmospheric overburden is irrelevant for an FD-based energy estimation, but increases the uncertainties for a ground-based estimation.

The results of the cross-calibrated $z$ values are shown in Figure 5. The $z$ values depend on the event generator used in the air shower simulations. The WHISP group uses published simulations that employed event generators available at the time of publication. Therefore, not all data points are available for each event generator. At first glance, there is no globally coherent picture from 1 PeV to 10 EeV. Taking into account the different conditions under which the experiments performed the measurements, such as distance to the shower core, zenith angle, or muon energy, does not reduce the problem [70]. However, two groups can be identified. Experiments that directly measure the shower energy with little (red brown markers) or no muon contribution (black markers), such as IceCube and Auger, show a muon deficit in the simulations that grows at a constant rate with increasing energy and appears as an increase in $z$ over the expectation from the cosmic ray composition. Experiments with medium (red) or high contribution (orange) of muons to the estimator of the shower energy, or intensity based experiments (green) where the energy scale is inferred from the muon content itself, show no consistent picture. Presumably because the strong dependence of the number of muons on the shower energy masks the deficit.

Appendix D Details on tuning

In automatic tuning, there are two practical approaches commonly used to find an optimal a description of the data: either by minimizing the least-squares-type cost function via gradient descent, or in a Bayesian framework by computing the posterior probability density of the parameters from the likelihood function and priors

Gradient-based tuning

A least-squares-type cost function quantifies the agreement between the event generator and the measurements. It can be optimized based on local gradient descents to determine the best set of tuning parameters. To allow analytical computations of the gradient, the event generator is replaced by a surrogate model.

Gradient-based automatic tuning first became available in particle physics with the Professor [242] software package, which uses a high-dimensional polynomial as a surrogate model. This inspired the development of Apprentice [243], which offers several improvements. It supports the use of Padé approximants (ratios of polynomials) to construct the surrogate model, which provide more flexibility, and provides algorithms to avoid unwanted poles in their construction.

Bayesian tuning

Bayesian tuning also starts with a weighted least-squares-type cost function, and the construction of a surrogate model to replace the event generator. A Markov-Chain Monte-Carlo is then used to sample points from the posterior distribution to obtain the best set of parameters, including uncertainties and their correlations. The posterior distribution can be used to obtain the best set of parameters and to study the uncertainties of the parameters and their correlations. The effectiveness of the Bayesian tuning approach was demonstrated using collider data from LEP experiments [109, 110].

An essential first step towards global tuning to particle and astroparticle data is to demonstrate the feasibility of tuning to air shower data. To this end, a study based on the event generator Pythia 8 and the air shower simulation code Corsika 8 [108] was performed by generating mock air shower data using the default settings in Pythia 8. Bayesian tuning was then used to test whether these default values could be recovered. In one scenario the number of muons produced in the shower, $N_{\mu}$ , was chosen as the air shower observable to tune to. In the second scenario, both $N_{\mu}$ and the depth of the shower maximum, $X_{\text{max}}$ were used. In the first scenario, the strong coupling constant $\alpha_{s}$ was chosen as the tuning parameter, while in the second scenario, two parameters related to Lund string fragmentation (and thus hadron multiplicity) were chosen. Tuning was then performed as described above. A surrogate model was constructed by running multiple sets of air shower simulations of 10 PeV protons with different values for the respective tuning parameters. In both scenarios, the Bayesian method successfully generated posterior distributions centered on the input values. This study shows that tuning to air shower observables is in principle possible, and that the Bayesian tuning approach is a promising method for global tuning to particle and astroparticle data. The experiment also highlights the need for fast air shower simulations. The next step is to perform a first tune to data.

Table 5 provides a summary of the Rivet plugins and HEPData entries used in the forward tune of Pythia, along with the ongoing efforts towards a tune for air showers discussed in Subsection 5.2.

D.1 Tuning without a surrogate model

One limitation of established automatic tuning methods is the need to construct a surrogate model. This step adds complexity because the parameter space must be efficiently sampled, and the approach suffers from the curse of dimensionality: to construct a surrogate model, one must perform a grid-like sampling of a high-dimensional parameter space. The number of required grid points grows exponentially with the number of dimensions.

Methods based on stochastic gradient descent (SGD) could potentially be used to tune the event generator directly, avoiding the construction of the surrogate model, and dramatically reducing the computational cost of tuning many parameters at once. SGD algorithms, such as Adam [117], successfully train neural networks with huge parameter spaces at a fraction of the computational cost of traditional methods. They avoid the curse of dimensionality, since the cost of computing the gradient grows only linearly with the number of dimensions.

When training neural networks, gradient formulas are exact, but computed over a random subset of the full data sample. This introduces random fluctuations into the gradient, similar to those introduced by Monte Carlo simulation in the event generator. SGD algorithms are designed to handle these fluctuations, but exploding gradients are still a problem. In the tuning application, gradients would have to be calculated with a finite-difference formula using an automatically chosen step size large enough to avoid gradient explosion. The Adam algorithm already automatically determines the learning rate for each parameter, based on previous results. Therefore, it seems plausible that the step size for the finite-difference computation of the gradient can be treated similarly. Future research could explore this possibility.

Bibliography243

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. Bastieri “Using the photons from the Crab nebula seen by GLAST to calibrate MAGIC and the imaging air Cerenkov telescopes” In Astropart. Phys. 23 , 2005, pp. 572–576 DOI: 10.1016/j.astropartphys.2005.05.002 · doi ↗
2[2] M. Meyer, D. Horns and H.-S. Zechlin “The Crab Nebula as a standard candle in very high-energy astrophysics” In Astron. Astrophys. 523 , 2010, pp. A 2 DOI: 10.1051/0004-6361/201014108 · doi ↗
3[3] Johannes Albrecht “The Muon Puzzle in cosmic-ray induced air showers and its connection to the Large Hadron Collider” In Astrophys. Space Sci. 367.3 , 2022, pp. 27 DOI: 10.1007/s 10509-022-04054-5 · doi ↗
4[4] Karl-Heinz Kampert and Michael Unger “Measurements of the Cosmic Ray Composition with Air Shower Experiments” In Astropart. Phys. 35 , 2012, pp. 660–678 DOI: 10.1016/j.astropartphys.2012.02.004 · doi ↗
5[5] Tigran Kalaydzhyan and Edward Shuryak “Collective flow in high-multiplicity proton-proton collisions” In Phys. Rev. C 91.5 , 2015, pp. 054913 DOI: 10.1103/Phys Rev C.91.054913 · doi ↗
6[6] Jaroslav Adam “Enhanced production of multi-strange hadrons in high-multiplicity proton-proton collisions” In Nature Phys. 13 , 2017, pp. 535–539 DOI: 10.1038/nphys 4111 · doi ↗
7[7] Sebastian Baur et al. “Combined analysis of accelerator and ultra-high energy cosmic ray data” In Po S ICRC 2015 , 2016, pp. 418 DOI: 10.22323/1.236.0418 · doi ↗
8[8] V.. Petrov, R.. Ryutin and A.. Sobol “LHC as pi p and pi pi Collider” In Eur. Phys. J. C 65 , 2010, pp. 637–647 DOI: 10.1140/epjc/s 10052-009-1202-0 · doi ↗