Inferring local quasar IGM damping wing constraints
Timo Kist, Joseph F. Hennawi, Frederick B. Davies

TL;DR
This paper develops a Bayesian inference framework using Hamiltonian Monte Carlo to analyze quasar damping wings, enabling precise local and global constraints on the IGM's neutral hydrogen content and reionization timing.
Contribution
It introduces a novel, model-independent Bayesian method to jointly infer quasar continuum, local IGM properties, and reionization parameters from damping wing spectra.
Findings
Constrained HI column density to 0.69 dex precision.
Estimated the distance to the first neutral patch as ~31.4 cMpc.
Improved global reionization timing constraints when tied to models.
Abstract
Lyman- damping wings towards quasars are a highly sensitive probe of the neutral hydrogen (HI) content in the foreground intergalactic medium (IGM), not only constraining the global timing of reionization but also the \textit{local} ionization topology near the quasar. Near-optimal extraction of this information is possible with the help of two recently introduced reionization model-independent summary statistics of the HI distribution in the IGM \textit{before} the quasar started shining, complemented with the quasar's lifetime encoding the effect of its ionizing radiation as a third parameter. We introduce a fully Bayesian JAX-based Hamiltonian Monte Carlo (HMC) inference framework that allows us to jointly reconstruct the quasar's unknown continuum and constrain these local damping wing statistics. We put forward a probabilistic framework that allows us to tie these local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGalaxies: Formation, Evolution, Phenomena · Astronomy and Astrophysical Research · Statistical Mechanics and Entropy
Inferring local quasar IGM damping wing constraints
Timo Kist,1 Joseph F. Hennawi1,2 and Frederick B. Davies3
1Leiden Observatory, Leiden University, P.O. Box 9513, 2300 RA Leiden, The Netherlands
2Department of Physics, University of California, Santa Barbara, CA 93106, USA
3Max-Planck-Institut für Astronomie, Königstuhl 17, 69117 Heidelberg, Germany E-mail: [email protected]
(Accepted XXX. Received YYY; in original form ZZZ)
Abstract
Lyman- damping wings towards quasars are a highly sensitive probe of the neutral hydrogen (HI) content in the foreground intergalactic medium (IGM), not only constraining the global timing of reionization but also the local ionization topology near the quasar. Near-optimal extraction of this information is possible with the help of two recently introduced reionization model-independent summary statistics of the HI distribution in the IGM before the quasar started shining, complemented with the quasar’s lifetime encoding the effect of its ionizing radiation as a third parameter. We introduce a fully Bayesian JAX-based Hamiltonian Monte Carlo (HMC) inference framework that allows us to jointly reconstruct the quasar’s unknown continuum and constrain these local damping wing statistics. We put forward a probabilistic framework that allows us to tie these local constraints to any specific reionization model and obtain model-dependent constraints on the global timing of reionization. We demonstrate that we are able to constrain the (Lorentzian-weighted) HI column density in front of the quasar to a precision of and its original distance to the first neutral patch before the quasar started shining to (if a noticeable damping wing is present in the spectrum), extracting hitherto unused local information from the IGM damping wing imprint. Once tied to a specific reionization model, we find that the statistical fidelity of our constraints on the global IGM neutral fraction and the lifetime of the quasar improves, while retaining the same precision as achieved by pipelines that infer these parameters directly.
keywords:
cosmology: observations – cosmology: theory – dark ages, reionization, first stars – intergalactic medium – quasars: absorption lines, methods: statistical
††pubyear: 2025††pagerange: Inferring local quasar IGM damping wing constraints–Inferring local quasar IGM damping wing constraints
1 Introduction
Little doubt exists about the fact that the formation of the first stars and galaxies heralded the epoch of reionization, a major phase transition in the history of our universe driven by the highly energetic radiation emitted by these objects which progressively reionized all neutral hydrogen (HI) in the intergalactic medium (IGM). Most details about this landmark event, however, are yet to be determined, starting with its mere timing in cosmic history, through to its topological features, informing us about the nature of the sources and sinks of reionization. One of the most promising probes of reionization is the Lyman- transition in the spectra of bright astrophysical sources such as quasars. Its remarkable sensitivity to even the smallest amounts of neutral hydrogen in the IGM, causing extended Gunn-Peterson absorption troughs already at global volume-averaged IGM neutral fractions of (Gunn & Peterson, 1965), and the red damping wing induced by quantum mechanical line broadening when becomes of order unity (Miralda-Escudé, 1998), allow us to gain unique insights into how reionization proceeded over cosmic time.
While the first damping wing constraints have been reported for individual objects (Mortlock et al., 2011; Bolton et al., 2011; Greig et al., 2017b, 2019; Greig et al., 2022; Bañados et al., 2018; Davies et al., 2018; Wang et al., 2020; Yang et al., 2020; Ďurovčíková et al., 2020; Reiman et al., 2020), Greig et al. (2024a) and Ďurovčíková et al. (2024) recently analyzed the first statistical ensembles probing the final stages of reionization. Even tighter constraints, reaching much deeper into the core stages of reionization, will be achievable in the coming years thanks to the growing number of high-redshift quasar spectra available to us (D’Odorico et al., 2023; Onorato et al., 2025; Euclid Collaboration: Yang et al., 2025), most prominently driven by the unprecedented discoveries of new objects at ever-higher redshifts in the Euclid wide field survey (Euclid Collaboration et al., 2019; Bañados et al., 2025; Euclid Collaboration: Yang et al., 2025).
Moreover, smooth roll-offs in the spectra near individual Gunn-Peterson troughs in the foreground of specific quasars have been interpreted as damping wings, possibly pointing to the persistence of neutral islands in the IGM down to (Becker et al., 2024; Spina et al., 2024; Zhu et al., 2024; Sawyer et al., 2025). In addition to that, the advent of JWST has enabled the first damping wing constraints towards galaxies, pushing the highest-redshift frontiers (Curtis-Lake et al., 2023; Hsiao et al., 2024; Keating et al., 2024b; Park et al., 2025; Umeda et al., 2024, 2025; Mason et al., 2025). However, additional care needs to be taken in this context due to the presence of intrinsic damped Lyman- absorbers (DLAs; Heintz et al., 2024, 2025) which can mimic the cosmological imprint from the IGM (Huberty et al., 2025), introducing an additional nuisance process which needs to be marginalized out (Mason et al., 2025).
Quasars, on the other hand, as powerful sources of ionizing radiation, suffer considerably less from this issue as they ionize away all residual neutral gas within their -scale proximity zone. Any potentially remaining proximate DLA systems can be excluded based on the presence of associated weak metal absorption systems which can be identified with the help of supplementary high-quality spectra (Davies et al., 2025). The remaining complication is the reconstruction of the intrinsic quasar continuum that the cosmological damping wing imprint has to be disentangled from. As the properties of these continua do not appear to evolve notably with redshift (Shen et al., 2007), data-driven models can be built based on unabsorbed low-redshift continua, and a plethora of approaches have been developed in the past years, ranging from simple models based on principal component analysis (PCA) to non-linear, neural network-based ones (for a comprehensive overview, see Greig et al., 2024b), most recently as a unified framework to jointly reconstruct the quasar continuum and the IGM absorption imprint (Hennawi et al., 2025b).
Kist et al. (2025c) demonstrated that approximately half the total error budget on the inferred IGM neutral fraction is due to this continuum reconstruction task, whereas the other half is sourced by the stochastic distribution of IGM density fluctuations, as well as the distribution of neutral patches during reionization. To eliminate the latter part of this stochasticity from the inference task itself, Kist et al. (2025b) introduced a new parameterization of IGM damping wings, probing the local ionization topology in front of a given quasar before it is altered by the quasar’s ionizing radiation. This three-parameter model comprises 1) the HI column density, weighted by a Lorentzian profile that accounts for the frequency dependence of the Lyman- cross section, 2) the distance from the quasar to the first neutral patch, and 3) the quasar lifetime which encapsulates the effects of the quasar’s ionizing radiation.
Kist et al. (2025b) further demonstrated that these summary statistics are reionization model invariant, and argued that they, when tied to a specific reionization model, provide constraints not only on the timing but also the topology of reionization. This work is concerned with the practical realization of this inference task by leveraging a fully Bayesian framework introduced in Hennawi et al. (2025b) which allows us to infer the first local IGM damping wing constraints. Our local summary statistics can be constrained freely from any prior assumptions about the underlying reionization model. We demonstrate how the topology information from a given reionization model can be folded into these constraints subsequently in a probabilistic manner, resulting in topology-informed local constraints, and, in addition, a global constraint on the IGM neutral fraction . In a companion paper, we apply this new approach to JWST/NIRSpec spectra of the two quasars J1007+2115 and J1342+0928 (Kist et al., 2025a).
We start by introducing this local parameter framework in Section 2, and proceed in Section 3 by describing our inference approach originally introduced in Hennawi et al. (2025b) which is applicable both in the context of our new local IGM damping wing parameterization as well as the conventional global one, parameterized by the volume-averaged neutral fraction and the lifetime of the quasar. In Section 4 we test the statistical fidelity of the pipeline and quantify the precision of the local damping wing constraints. We also compare the precision of the resulting constraints on and to that of the directly inferred ones. We conclude in Section 5.
2 Theory: relating a local IGM damping wing model to the global timing of reionization
Due to its sensitivity to neutral hydrogen in the foreground IGM, the Lyman- damping wing signature in the spectra of high-redshift sources is considered one of the key probes of the global volume-averaged IGM neutral fraction as a function of redshift. As has recently been noted (Chen, 2024; Keating et al., 2024a; Kist et al., 2025b), considerable sightline-to-sightline variations are possible due to fluctuations in the cosmological density field as well as the patchy nature of reionization. Specifically, Kist et al. (2025b) identified a two-dimensional set of physical summary statistics that (along with the lifetime of the quasar as a third parameter) tightly parametrizes the characteristic shape of the IGM damping wing. These two statistics are informative not only about the global timing of reionization, but also the local ionization topology before the quasar started shining. Note that the quasar radiation inevitably modifies the topology that ultimately imprints the damping wing, but our physical interest is directed at the original topology which encapsulates the information about what we will henceforth refer to as the pre-quasar IGM, in contrast to the post-quasar one, impacted by the quasar’s ionizing radiation. In this section, we will provide a short but self-contained overview over this parameterization, and demonstrate subsequently how we can fold in the topological information that arises from the assumed reionization model through a prior on our local damping wing statistics, and how we can use this to relate back these summaries to the global timing of reionization, parameterized by the redshift evolution of the global IGM neutral fraction .
2.1 A local, topology-independent parameterization of quasar IGM damping wings
We start by defining the two summary statistics of the local ionization topology around quasars introduced in Kist et al. (2025b). We refer the reader to this work for all additional details. The authors demonstrated that the information contained in the damping wing optical depth can largely be condensed into a single number, resulting in a near-optimal parameterization of the Lyman- damping wing across the entire spectral range. For a quasar at redshift with a post-quasar HI density field , the damping wing optical depth as a function of Lyman- rest-frame wavelength is given by
[TABLE]
where is the Lyman- cross section, the infinitesimal proper line-of-sight interval, and the corresponding proper distance from the observer to the quasar.
In essence, Eq. (1) constitutes a column density integral of the HI density field with an additional weighting kernel, governed by the Lyman- cross section . To excellent approximation, at a given spectral velocity pixel is of Lorentzian shape, i.e., , whose impact we can capture by defining the Lorentzian line-of-sight average of a field (such as the HI density field ) as
[TABLE]
where is a dimensionless integration variable, and a normalization factor ensuring for the identity field . Here we defined as the (positive) proper distance value corresponding to the red-side velocity offset via .
Instead of directly taking the Lorentzian-weighted average of the post-quasar HI density field , we operate on its pre-quasar version , unaffected by the ionizing quasar radiation and therefore directly informative about the pre-quasar ionization topology parameterized by the neutral fraction field , along with the dimensionless matter overdensity field and the cosmic mean hydrogen density . Specifically, we define the Lorentzian-weighted HI column density
[TABLE]
where the normalization factor is defined as above, and , and are the comoving versions of the proper distances , and .111Note that this is the convention we will adopt for all distances throughout this work. With this definition at hand, it is then straightforward to show that we have identified an asymptotically optimal parameterization of the (pre-quasar) IGM damping wing, in the sense that at the velocity offset itself, we find a direct proportionality between the optical depth and our summary statistic:
[TABLE]
in the limit where the integration limits approach and , and where the Lorentzian approximation of the Lyman- cross section is valid.
To make the statistic adequate for the post-quasar optical depth despite the fact that we are operating on the pre-quasar HI density field, we have to account for the fact that quasars commonly ionize away all neutral material within the first few surrounding them, and therefore not contributing to the optical depth integral in Eq. (1). We do so by starting the integration in Eq. (2) at a common size of this ionized bubble, . We fix the upper integration limit to since any neutral patches beyond this distance are down-weighted to such a high degree by the Lorentzian weighting kernel that they do not notably contribute to anymore. Lastly, we set the reference distance with respect to which our summary statistic is defined, to (corresponding to at ), noting that we are not sensitive to this choice since the damping wing largely constitutes a one-parameter family, i.e., the imprint is so correlated that its value at one spectral pixel closely determines its value at all other spectral pixels, as demonstrated in Kist et al. (2025b). As the parameter range for spans several orders of magnitude, we are in fact sensitive to the logarithmic quantity , and for notational simplicity we henceforth adopt as short-hand notation for this.
We adopt as a second summary statistic the distance between the source and the first neutral patch in the pre-quasar topology. This quantity has been constrained by Mason et al. (2025) in the context of galaxy IGM damping wings, and has been shown in Kist et al. (2025b) to remain a meaningful summary for quasar IGM damping wings—despite the impact of the ionizing quasar radiation due to which the pre- and post-quasar distance to the first neutral patch do not necessarily agree (c.f. Schroeder et al., 2013, who inferred the post-quasar version of this distance). Most importantly, encapsulates information complementary to as demonstrated in Kist et al. (2025b).
2.2 Folding in the topology dependence and constraining the global IGM neutral fraction
Kist et al. (2025b) showed that the two local summary statistics and not only minimize the scatter of IGM transmission values (i.e., the amount of information that these statistics do not explain) in the damping wing region of the spectrum to across the entire range of physical parameter space, but that these two summaries are largely insensitive to the underlying reionization topology in the sense that the median and the -percentile scatter of the IGM transmission profiles at a given set of parameter values are identical, regardless of what reionization topology the underlying sightlines originate from. This implies that the specific distribution of neutral patches along a given sightline does not impact the damping wing shape, provided the and parameter values are fixed. All topological information is instead fully encoded in the statistical distribution of these statistics within a given topology. As a result, in a Bayesian sense, all assumptions about the ionization topology can be absorbed into the prior distribution imposed on , while the IGM transmission likelihood, given and as model parameters, remains reionization model independent and can be determined by considering models of damping wing transmission profiles from any fiducial reference topology. We exploit this fact by constructing the likelihood based on sightlines generated according to the simplistic toy prescription introduced in Kist et al. (2025b) which augments the smoothness of the likelihood as a function of and , facilitating the practical task of sampling from the posterior distribution. The realistic topology only enters the analysis for determining the prior, and for converting the local constraints into a global constraint. We do so by probabilistically relating global and local parameters to one another based on how the local parameters are distributed in a given global topology.
2.2.1 Simulating IGM transmission profiles
We simulate IGM transmission profiles following Davies et al. (2018)’s hybrid approach of combining sightlines from cosmological hydrodynamical simulations with independent neutral fraction skewers and 1d radiative transfer. We extract density, velocity and temperature skewers originating at the most massive halos with masses of , and extending towards the six principal directions of the snapshot of the Nyx hydrodynamical simulations (Almgren et al., 2013; Lukić et al., 2015). The simulation box measures on a side and contains baryon and dark matter particles, respectively.
We simulate neutral fraction skewers in two different manners, 1) by extracting them from realistic semi-numerical reionization topologies, used to determine our priors and convert our local constraints to a global constraint on the timing of reionization, or 2) by producing synthetic skewers according to a simple toy prescription which we use to determine the IGM transmission likelihood.
The former skewers originate from topologies generated using a modified version of the 21cmFAST code (Mesinger et al., 2011; Davies & Furlanetto, 2022) at fixed global IGM neutral fractions of . Intermediate values between zero and one are achieved by tuning the ionizing efficiency . From each simulation box (of size on a initial and output grid), we extract randomly oriented skewers originating from each of the most massive halos of masses , giving a total of skewers. We then combine each of our hydrodynamical Nyx sightlines with a random draw from these neutral fraction skewers. By construction, these skewers vary smoothly as a function of the global IGM neutral fraction . We can also measure the local summary statistics and for each individual sightline,222Note that before determining , we smooth the field with a box-car filter of size and define as the distance from the quasar where this smoothed field first exceeds a neutral fraction threshold of to avoid being overly sensitive to extremely small-scale fluctuations in the field. and aggregate all sightlines according to these labels; however, this does not guarantee that these aggregated profiles vary smoothly as a function of and since the number of available sightlines can vary significantly in different regions of parameter space.
Such small sample sizes have a particularly unfavorable impact on the noise level of the covariance matrices we have to estimate for determining the likelihood function (see Eq. (11)). Noisy covariance matrices can introduce discontinuities to this, and sampling from such a non-smooth distribution becomes practically impossible. We can avoid such issues by instead generating synthetic skewers according to the toy prescription introduced in Kist et al. (2025b) which by construction ensures smooth variations of the resulting skewers with respect to these local summaries. In short, we start from a given density sightline in its completely ionized state, and subsequently add in neutral patches of minimum size , growing them out one-by-one to a maximum size until a desired HI column density is reached. By definition, the position of the first neutral patch is set by , and the subsequent patches are added according to a fixed sequence of neutral patch locations for a given density sightline, ensuring continuity of the resulting skewers with respect to . We emphasize that achieving continuity is the sole purpose of this particular approach, and that it does not seek to model any realistic physical processes occurring during reionization—in fact, Kist et al. (2025b) demonstrated that a physical reionization model is not necessary for our purposes of determining the IGM transmission likelihood, as the mean and variance of the resulting IGM transmission profiles in our local damping wing parameterization only depend on the distribution of density fluctuations in the IGM but not on the specific distribution of neutral patches along the line of sight.
Further, it is important to note that the minimum and maximum column density values are sightline-dependent, set by the distribution of IGM density fluctuations along the line of sight for a completely neutral and a completely ionized sightline, respectively. At a given location in parameter space, we can therefore end up with a number between [math] and sightlines, depending on how many sightlines allow for this specific parameter combination. We compute these synthetic skewers on an irregular parameter grid, reflecting the degree to which these parameters impact the resulting IGM transmission profiles. Specifically, our parameter grid consists of column density values between , and neutral patch distances between . A finer grid spacing is chosen towards higher column densities and shorter neutral patch distances where small parameter variations have a noticeable impact on the damping wing strength, whereas this is not necessary at low and high where no damping wing is present.
Finally, we combine hydrodynamical and neutral fraction skewers, adopting the Nyx temperature field for all ionized regions, and assuming an initially cold IGM with for all neutral patches. We then perform one-dimensional radiative transfer along these sightlines to model the impact of the ionizing quasar radiation following Davies et al. (2016). Our model resembles the quasar ULAS J1342+0928 with a simple light bulb light curve corresponding to an ionizing photon emission rate of for quasar lifetimes on a logarithmic grid with values between and , and we henceforth adopt as short-hand notation for . To produce Lyman- transmission profiles, we finally convolve the physical output fields with a Voigt profile (Tepper-García, 2006).
As there is no star formation prescription in the Nyx simulations, strong proximate optically thick absorption line systems are not modeled realistically. This means we have to remove such sightlines from our full set of simulated IGM transmission profiles. This does not pose a major limitation to our approach as we do not aim to constrain the ionization state of the IGM based on targets where such strong absorption systems are present. As their absorption signature could easily be confused with the intergalactic one, this would introduce a significant additional modeling uncertainty when trying to disentangle it from the IGM damping wing, similar to what is required for galaxies (Mason et al., 2025). Observationally, such sightlines can be excluded a priori by identifying associated metal absorption lines in the spectrum of the source (Davies et al., 2025). In our simulated profiles, we identify such sightlines by computing the HI column density within chunks of size along the fully ionized realization of each sightline, and excluding all those sightlines containing at least one chunk with an (unweighted) HI column density of at least within the first from the source, leaving of the total of sightlines that can be used for the subsequent analysis.
We depict in Figure 1 the median profiles based on the synthetic skewers as a function of the three parameters of our local parameterization. From top to bottom, we show the variation with respect to , and , respectively. The parameters which are not varied in a given panel are fixed to the reference values of , and (black line in each panel). We observe the well-known trend of larger proximity zones for long-lived quasars due to the increasing size of the ionized bubble such objects have carved out around themselves (see e.g. Chen & Gnedin, 2021). In line with this, the strength of the IGM damping wing decreases with increasing due to the decreasing amount of neutral hydrogen along the line of sight. Similarly, the damping wing strength increases and the proximity zone size decreases with increasing HI column density . Note, however, that this parameter captures the properties of the pre-quasar ionization topology, whereas encapsulates the effects of the quasar’s ionizing radiation.
The functional dependence of the profiles on the distance from the source to the first neutral patch is more complex. First, note that the median profiles in the two upper panels show a clear bump in transmission at . This is because unlike the already ionized material closer to the quasar, this initially neutral patch at receives an additional amount of photoelectric heating when ionized by the quasar, leading to an enhancement in transmission in the corresponding region of the spectrum (c.f. Kist et al., 2025c). We can see in the bottom panel of Figure 1 that this position shifts with the distance the first neutral patch, with the profile showing a clear bump at that location instead. The remaining shape of the profiles depends highly non-trivially on the value of . For values of , the first neutral patch is located outside the integration range of which starts at . As a result, is the only statistic capturing information about any pre-quasar neutral material in this region, and hence acts as a summary akin to , with increasingly strong absorption for smaller values of .
This behavior turns around for , where the damping wing shape is already captured to lowest order by . Contrary to the case where , we find a mild increase in the strength of the absorption signature with increasing . This is because we are concerned with pre-quasar statistics, and at fixed HI column density , a neutral patch is more likely to be ionized away by the radiation of a quasar of a given lifetime the closer it is located to the source, i.e., the smaller the value of . However, more nearby neutral patches are also the greatest contributors to the damping wing optical depth, and as such, the damping wing strength is lower the closer the first neutral patch was located to quasar, i.e., the more likely it was to get ionized away. When comparing to the upper two panels, however, we see that these effects are clearly subdominant to the impact of quasar lifetime and HI column density on the IGM damping wing strength. However, the fact that the transmission profiles still change as a function of when and are fixed shows that this parameter carries information complementary to that encapsulated by the other two parameters.
2.2.2 The prior on the local summary statistics induced by the reionization model
As demonstrated in Kist et al. (2025b), the mean and variance of the IGM transmission profiles at fixed parameter values do not depend on the reionization topology that the sightlines originate from. Instead, the topology exclusively determines how common a given parameter combination is. In other words, a statistical distribution on these parameters—be it prior or posterior—can directly be translated between the global parameterization and the local parameterization.
When constraining the timing of reionization with quasar IGM damping wings, we do not want to impose any prior assumptions on the reionization state of the universe, i.e., a priori, we consider any global IGM neutral fraction value between [math] and as equally likely. Formally speaking, we impose a flat prior on the global IGM neutral fraction, where is a uniform distribution of the random variable on the interval .
But the assumed reionization topology determines the conditional distribution of the local summaries given , since clearly different local parameter configurations are differently common in different topologies with different global IGM neutral fractions, and henceforth we denote all distributions that are affected by these topology assumptions with a corresponding subscript. Based on this, we can use the global prior to write down the joint prior distribution on all three parameters:
[TABLE]
Marginalizing Eq. (5) over the global IGM neutral fraction, we see that the simple uniform prior on translates into a non-trivial, topology-informed prior on the local parameters , given by
[TABLE]
It is this step where the assumptions about the reionization model affect our constraints: different reionization topologies can come with a different probabilistic mapping between global and local labels. On the other hand, since the likelihood of the IGM transmission field is reionization model independent in our local parametrization, is the distribution which encapsulates the entire topology dependence of our constraints.
For the sake of quoting a bare set of local, topology-independent constraints on and , fully agnostic to the reionization model, we initially impose a constant prior distribution in two-dimensional parameter space which we denote as (in contrast to the physical prior resulting from Eq. (6)). This constant prior (marked by the dashed line of the panel of Figure 2) is only limited by the hard physical boundaries for these two quantities as we will discuss below. Based on this, we infer the topology-agnostic posterior distribution according to Bayes’ theorem:
[TABLE]
If we assume the full three-dimensional prior factorizes into , folding in the topology dependence then simply amounts to a change of priors to the non-trivial prior governed by the topology of interest as per Eq. (6):
[TABLE]
where the denominator is trivial since is constant by assumption, and it encloses the entire physical domain, implying that the support of any non-trivial prior is guaranteed to be a subset of the support of .
We now proceed by practically determining for the realistic semi-numerical reionization topology considered in this work. Enforcing a uniform prior on is straightforward in the case at hand, as we are already provided with IGM transmission profiles at regularly spaced parameter values between [math] and , well-representing a uniform prior distribution. By computing the local summary statistics and for all sightlines (i.e., distinct density sightlines combined with different models each), and marginalizing over the underlying values, we are therefore immediately provided with a set of samples from the prior distribution informed by this topology.
We show in Figure 2 a corner plot of the three-dimensional joint distribution as given by Eq. (5) before performing the marginalization over . For illustrational purposes, we depict 2d-marginals of a three-dimensional kernel density estimation (KDE) of the distribution rather than the actual samples, especially because of the discrete grid spacing in the dimension. Note that the panel of the plot can individually be understood as a contour plot of the prior distribution given by Eq. (6) under the assumption of a flat prior on .
For reference, the dashed line in the panel marks the two-dimensional constant prior which follows from our synthetic toy profiles used for constructing the IGM transmission likelihood. We immediately note that it covers a significantly more extended range of parameter space than the realistic ionization topology. This is because our toy prescription can produce profiles at any parameter combination that is not excluded physically. This particularly also includes sightlines that simultaneously show low HI column densities and short neutral patch distances, i.e. the bottom left region of the panel. To keep the column density low, such sightlines cannot contain many neutral patches in addition to the first one that is closest to the source demarcating the value of —a configuration that is rarely found in realistic reionization topologies due to the inside-out nature of reionization. On the other hand, high HI column densities with large neutral patch distances as would be seen in the top right corner of the panel are not found in any topology whatsoever. This is a hard physical constraint as the maximum HI column density, set by the number of sightlines which are completely neutral starting at (and, by definition, ionized at ), clearly decreases as a function of . Note that even if all material further along the line of sight from is neutral, the density fluctuations limit the maximum value of that can be achieved. These maximum column density values are thus exclusively determined by the distribution of density fluctuations in the IGM.
In the same manner, we would expect the minimum column density to increase monotonically with decreasing , as these minimum values are set by those sightlines which are completely ionized except at a single, small neutral patch located at whose contribution to increases according to the Lorentzian weighting with decreasing . The dashed line on the left-hand side of Figure 2 shows that this is indeed the case for all neutral patch distances . Note that the sharp edges we see when following the dashed line from down to result from the discrete parameter grid whose spacing is chosen relatively coarsely in this region of parameter space.
However, this behavior turns around at , where the minimum column density reduces back to (manifesting in the kink seen in the bottom left of the panel in Figure 2). This discontinuity arises from the fact that our lower integration limit for the HI column density (see Eqs. (2) and (3)) is fixed to specifically this distance. As a result, any neutral material closer to the quasar than will not contribute to . To further understand the seemingly peculiar shape, now imagine two different scenarios: first, a sightline with which contains a single neutral patch at the location closer to the quasar than but which is ionized from onward. Second, a sightline with which only contains a neutral patch at the location more distant from the quasar than . Then the neutral patch in the second scenario with certainly contributes to while that of the first sightline does not since it is located outside of the integration range of . Despite its small value, the sightline in the first scenario therefore has a smaller HI column density than the sightline in the second, whose value is higher but whose neutral patch inevitably contributes to . This behavior generalizes as all sightlines with by construction have at least one neutral patch contributing to , and as a result, the minimum column density of sightlines with is smaller than that of sightlines with , giving rise to the kink we observe at in Figure 2.
Note further that the realizations of the realistic topology (black dots in Figure 2) seemingly extend beyond the hard physical boundary enclosed by the dashed line. A number of effects conspire to cause this behavior. First, and most importantly, the dashed line respects the additional requirement of having available at least toy profiles to estimate smooth covariance matrices for the IGM transmission likelihood at any given point in parameter space, excluding certain configurations that are rare but not entirely impossible to achieve physically. Second, as already pointed out above, the dashed line is a linear interpolation across our simulation grid which we chose rather coarsely towards lower and higher values where the transmission profiles cease to vary with these parameters. Third, this behavior looks even more drastic for the probability contours shown in this figure where smoothing effects make the distribution appear wider than it in reality is.
Aside from the limited support in the plane, the shape of the realistic prior differs significantly from a uniform distribution. We observe a bimodal distribution with a more pronounced peak towards higher HI column densities of and smaller neutral patch distances , and a lower peak at low column densities with large distances to the first neutral patch. The axis of degeneracy between the peaks naturally arises due to the fact that by definition, sightlines with a more nearby neutral patch can accommodate higher HI column densities. However, the fact that this is not a perfect one-to-one relation implies that our second summary statistic captures additional information not contained in .
Comparing to Figure 3 where we show the conditional distribution of the two local summaries and given six different values of the global IGM neutral fraction , we immediately see that the former peak is due to sightlines originating from more neutral topologies, while the latter one corresponds to the most ionized ones. An axis of degeneracy connecting the two peaks is also apparent in these panels; however, there is significant scatter about this axis due to the fact that the distribution of neutral patches along a given line of sight can vary significantly even at fixed global IGM neutral fraction . It is this amount of scatter which we exclude from the primary inference task by adopting our local parameterization rather than the global one.
Note that the high- peak is artificial in the sense that our simulated skewers have a finite length of . Strictly speaking, is ill-defined for all skewers that do not contain any neutral patch along their entire range. In such cases, we set to the maximum distance of , noting that this value really only constitutes a lower limit for the actual distance to the first neutral patch.333Technically speaking, we are thus inferring the parameter , where is the true physical distance to the first neutral patch. However, due to the Lorentzian decline of the Lyman- cross section , such distant neutral patches hardly leave any imprint on the observed IGM transmission profile anyways, and therefore this artificial bound will not bias our constraints.
2.2.3 Converting local measurements to a global constraint
Due to their local nature, our two summary statistics encapsulate information not only about the global timing of reionization, but also the local ionization topology itself. The former bit is conventionally quoted in terms of the global volume-averaged IGM neutral fraction . Here we elaborate on how such a global constraint can easily be obtained from our local summary statistics by again invoking the stochastic relation between global and local parameters. Apart from setting the prior, the topology dependence therefore naturally comes in a second time when converting the local constraints to a global constraint on .
Specifically, let us assume that, given an observed quasar spectrum , we inferred the two local summaries as well as the lifetime of the quasar , i.e., we obtained samples from the local, topology-agnostic posterior distribution via Eq. (2.2.2), but we are interested in the full topology-informed posterior distribution of both global and local parameters. By decomposing the latter via Bayes’ theorem, we can see that
[TABLE]
We can simplify this expression by harnessing the topology-independence of the IGM transmission likelihood in our local parameterization, which implies as the shape of the transmission profiles is not affected by what topology of what global neutral fraction they originate from, provided that the values of the local summary statistics and are fixed. By further assuming an independent lifetime prior such that , and decomposing the remaining joint distribution as per Eq. (5), we can straightforwardly combine Eqs. (2.2.2) and (9) to replace the likelihood with the topology-agnostic posterior and arrive at
[TABLE]
Note that the denominator is trivial since it only consists of the constant topology-agnostic prior that covers the entire physical domain (dashed line in Figures 2 and 3). In essence, folding in the topology dependence to obtain the full four-dimensional topology-informed posterior distribution is thus simply a matter of multiplying together the local, topology-agnostic posterior that we originally inferred, and the conditional probability distribution which encapsulates all information about the assumed ionization topology, along with the prior . In Eq. (2.2.3), we are therefore 1) imposing a non-trivial, physically motivated prior distribution on our local parameters, and 2) obtaining a constraint on the global timing of reionization.
In practice, we have to keep in mind that we are only provided with samples from the two aforementioned distributions. In order to perform the multiplication in Eq. (2.2.3), we introduce a parameter grid to evaluate the distributions on. We choose the grid spacing in accordance with our parameter grid for the IGM transmission profiles discussed in Section 2.2.1, and likewise for based on the irregular parameter grid on which our toy transmission profiles used to determine the IGM transmission likelihood are simulated.444Adopting this irregular binning ensures that we make use of the enhanced sensitivity for profiles with strong damping wings, i.e. at high and low , where we chose a finer grid spacing. We formally investigate the precision of our measurements as a function of astrophysical parameter space in Section 4.3. We then represent the two distributions through histograms with bins defined by these grids. As soon as is determined according to Eq. (2.2.3), we can arbitrarily marginalize the distribution via numerical integration over our parameter grid in order to obtain, e.g., the conventionally quoted posterior on the global IGM neutral fraction and quasar lifetime , or the associated one-dimensional marginals or .
3 Methods: inference pipeline
We described in the previous section a set of summary statistics that encodes the local information encapsulated in the IGM damping wing imprint in a topology-independent fashion. We now proceed by introducing an inference framework that can be used to constrain these parameters based on observed high-redshift quasar spectra. The formalism is based on the one established in Hennawi et al. (2025b) and Kist et al. (2025c) as a fully Bayesian framework to infer the global IGM neutral fraction and the lifetime of a quasar based on its observed spectrum . We start this section with a short summary providing an overview over all major modeling components, focusing especially on the adaptations made to constrain our local summary statistics rather than directly inferring the global IGM neutral fraction . For specific details, we point the reader to Hennawi et al. (2025b).
The damping wing signature is imprinted upon the spectra of high-redshift sources (such as quasars in our case) by the foreground IGM. In order to extract the information it encapsulates about the global—or local—ionization topology, we therefore first have to disentangle this imprint from the unabsorbed, intrinsic spectrum of the source. A two-step method has been the conventional approach to address this task (see e.g. Greig et al., 2017a; Davies et al., 2018): first, the intrinsic continuum of the quasar is reconstructed based on its correlation with the emission lines in the unabsorbed region of the spectrum redward of the Lyman- line. Subsequently, the resulting continuum-normalized spectrum can be used to draw conclusions about the ionization state of the surrounding IGM. Hennawi et al. (2025b) introduced for the first time a fully Bayesian framework that jointly performs these two tasks while operating on the entire spectral range, thus accounting for the full covariance resulting from the continuum reconstruction and the IGM transmission stochastic process.
The heart of the framework is a full generative model for high-redshift quasar spectra . Starting from a quasar continuum , based on a dataset of unabsorbed, low-redshift () continua, we fold in the IGM absorption imprint based on simulated IGM transmission profiles (see Section 2.2.1), parameterized as a function of . After forward-modeling instrumental effects by convolving these profiles with the line-spread function of a hypothetical spectrograph, as well as adding a realistic, heteroscedastic spectral noise vector , we are equipped with a realistic forward model for high-redshift quasar spectra .
To tackle the inverse problem of constraining the astrophysical parameters based on an observed quasar spectrum , we construct a data-driven low-dimensional parametric PCA model for the quasar continuum based on the aforementioned set of low-redshift spectra. Together with the simulated IGM transmission profiles , we use this to determine the full likelihood of the observed quasar spectrum as a function of the low-dimensional vector of (latent) PCA coefficients describing the full continuum and the astrophysical parameter vector , conditioned on the observational noise vector . Hennawi et al. (2025b) demonstrated that we can approximate this likelihood as
[TABLE]
where is the multivariate normal distribution of the random variable with mean and covariance matrix . Further, denotes the element-wise (Hadamard) product of the two mean vectors and , and we defined the matrices , , , as well as the covariance matrices and of and , respectively.
Note that this likelihood operates on the entire spectral range, both redward and blueward of the Lyman- line, and hence covers the smooth IGM damping wing as well as the Lyman- forest region with the quasar proximity zone. This allows us to jointly infer the astrophysical parameters and the continuum nuisance parameters .
3.1 Continuum dimensionality reduction model
We now proceed with a short summary of our parametric model for the quasar continuum. In short, we are using an updated version of the principal component analysis (PCA) model described in Hennawi et al. (2025b) which additionally accounts for luminosity variations in the spectra. This allows us to construct the PCA decomposition based on a times larger dataset of low-redshift continua, and extend the spectral coverage redward of Lyman- out to the Mg II line. A detailed description of this new model will be provided in Hennawi et al. (2025a).
In short, the dataset we use comprises low-redshift () spectra from the SDSS-III Baryon Oscillation Spectroscopic Survey (BOSS) and SDSS-IV Extended BOSS (eBOSS). These spectra cover a rest-frame wavelength range of and have a resolution of with a median signal-to-noise ratio of within a region around the rest-frame wavelength . For a small subset of these sources, we also obtained near-infrared spectra from the Gemini Near-Infrared Spectrograph Distant Quasar Survey (GNIRS-DQS; Matthews et al., 2021). The spectra were taken with the Gemini Near-Infrared Spectrograph (GNIRS; Elias et al., 2006) at the Gemini North Observatory. These spectra cover the m range to encompass the H and [O iii] region. They have a resolution of and were obtained to reach sensitivities comparable to the SDSS spectra at . All spectroscopic data from GNIRS-DQS were re-reduced with PypeIt (Prochaska et al., 2020).
We compute the PCA decomposition based on (i.e., ) of the full set of continua and keep the remaining apart as test set for estimating the reconstruction error and drawing mock continua. The PCA decomposition is computed based on the logarithmic continua, and we keep the PCA vectors accounting for the largest variance as basis vectors of our model. Note that since we perform the decomposition in log-space, the first PCA vector captures the normalization of the continua, and hence the luminosity dependence of the spectral shape is intrinsically accounted for. Note further that we opted for a weighted PCA that allows us to also include continua with missing pixels. For convenience, however, we chose our test set such that it exclusively contains continua with full spectral coverage. This provides us with a dimensionality-reduced description of the true continuum with the model parameters corresponding to the seven PCA coefficients .555As demonstrated in Kist et al. (2025c), a five-dimensional PCA model plus normalization factor is sufficiently flexible to represent the full quasar continuum for the purpose of astrophysical parameter inference based on IGM damping wings. As we perform the PCA in log-space without the need for an additional normalization factor, and due to the extended red-side spectral coverage of (as compared to in the analysis by Kist et al., 2025c), we increase the dimensionality of our PCA model to seven.
Hennawi et al. (2025b) showed that the associated relative continuum reconstruction error is well-approximated by a Gaussian distribution, and, as a result,
[TABLE]
We estimate the mean and the covariance of this distribution based on the test set of continua, where and are the mean and covariance of the continuum reconstruction error .
3.2 IGM transmission likelihood
All patches of neutral hydrogen in the surrounding IGM imprint an absorption signature on the continuum of the quasar. This is represented by IGM transmission profiles which we obtain from numerical simulations (see Section 2.2.1), parameterized by a set of astrophysical parameters . In the conventional approach, one adopts which we henceforth refer to as the global parameterization, whereas we here propose to consider the local parameterization .
In either case, to go forward, we follow Hennawi et al. (2025b) and approximate the distribution of given the astrophysical parameters as a multivariate Gaussian distribution:
[TABLE]
It is a well-known fact that the assumption of Gaussianity is in fact not fully justified when using as label (Lee et al., 2015; Davies et al., 2018), and neither in the case where gets replaced with our local summary statistics as we find in this analysis. The approximation, however, is vital to our formalism as it allows us to derive a closed-form analytical expression for the likelihood of the full quasar spectrum. To make sure that this does not result in an overestimation of our constraining power, we apply a principled procedure to broaden the posteriors by the degree required to ensure statistically faithful constraints (see also Hennawi et al., 2025b). In future, we aim to address these issues while retaining the full constraining power by learning the true non-Gaussian shape of with the help of simulation-based inference.
As far as this work is concerned, we simply estimate the mean transmission and the covariance based on our simulated IGM transmission profiles. Note that sampling from the posterior distribution via Hamiltonian Monte Carlo (HMC) becomes challenging in cases where the posterior distribution is not a smooth function of the parameters as this can prevent sufficient mixing of the HMC chains. The most critical part of our framework are the covariance matrices which are prone to becoming noisy if estimated based on too low a number of sightlines. This potential pitfall is addressed by our synthetic procedure for generating sightlines that vary smoothly as a function of and , as described in Section 2.2.1 (see also Kist et al., 2025b). This allows us to measure means and covariances based on a large number of sightlines at any desired location in parameter space with continuity with respect to those parameters intrinsically built in. An additional virtue of this procedure is that it allows us to cover the entire physically accessible domain in parameter space with a sufficient number of simulated sightlines for estimating covariance matrices.
Note also that before computing and , we forward-model instrumental effects for each individual transmission profile by convolving it with a Gaussian line-spread function (LSF) of width and rebinning it onto a coarse velocity grid with a pixel spacing of , covering a rest-frame wavelength range of . Even though strictly speaking this should not be done before multiplying in the quasar continuum, we perform these operations in pre-processing on the bare IGM transmission profiles, as doing this downstream would make the inference computationally infeasible without recognizable precision gains.
3.3 Inference procedure
With a full framework at hand for the likelihood of an observed quasar spectrum given astrophysical parameters and continuum nuisance parameters , we are now in the position to sample from the associated posterior distribution once we specified our priors. We adopt a uniform prior on the logarithmic quasar lifetime , covering a wide range of physically plausible values (Khrykin et al., 2021). In the context of our local parameter framework, we impose a two-dimensional constant prior on our local summary statistics, enclosed by the (non-trivial) boundaries set by the distribution of IGM density fluctuations as discussed in Section 2.2.2. For reference, we also perform the inference in the conventional, global framework, in which case we impose an uninformative, uniform prior on the IGM neutral fraction in addition to the aforementioned log-uniform lifetime prior. In practice, to aid the sampling procedure, we apply sigmoid transformations to all bounded parameters to make them fully unbounded. To account for the non-trivial two-dimensional prior boundary of , we first transform , and subsequently , conditioned on . This is required since the minimum and maximum HI column densities change as a function of . The PCA coefficients parameterizing the shape of the quasar continuum are already allowed to vary over the entire real axis , so here we only remove their mean and rescale them by their standard deviation.
We then deploy Hamiltonian Monte-Carlo (HMC) to sample from the resulting posterior distribution, using the HMC implementation with a No U-Turn Sampler (NUTS) from the NumPyro probabilisitic programming library based on the machine learning and autograd framework JAX (Bradbury et al., 2018; Bingham et al., 2018; Phan et al., 2019). For a given spectrum, we run four HMC chains with warm-up steps each, and an additional steps for sampling. Furthermore, we perform sigma-clipping on the spectrum before performing the inference itself in order to make sure our results do not get biased by a small number of outlier pixels. To that end, we start by iteratively fitting for the maximum-likelihood model with the help of an Adam optimizer with a learning rate of and optimization steps. We compute the statistic between the data and this best-fit model, masking up to five outlier pixels on the red side of the spectrum () per iteration. We repeat the same optimizing and clipping procedure on the remaining unmasked part of the spectrum for up to iterations before we start sampling from the posterior distribution via HMC.
3.4 Generating mock spectra
To verify the performance of our pipeline, we apply it to mock spectra generated in the following way. We start by drawing an unabsorbed continuum from the test set of low-redshift spectra used in Section 3.1 to estimate the continuum reconstruction error. Given a desired astrophysical parameter vector , we further draw an IGM transmission profile from the simulated profiles described in Section 2.2.1, with instrumental effects folded in as discussed in Section 3.2 via convolution with a Gaussian LSF of width and rebinning to a velocity grid. We interpolate the continuum onto the same wavelength grid (extending from to ), and multiply it together with the IGM transmission profile . Finally, we generate realistic heteroscedastic noise realizations including telluric absorption as well as contributions from object photons, sky background and detector read noise with help of the SkyCalc_ipy package (Leschinski, 2021); see Hennawi et al. (2025b) for details. The exposure time of the hypothetical instrument is adjusted such that the median signal-to-noise ratio per velocity interval is within a interval centered at in the rest-frame. Two different ensembles of such mock spectra are considered in this work:
A set of mock spectra on a parameter grid spanned by the global IGM neutral fraction and the quasar lifetime with values of and . The underlying IGM transmission profiles originate from the realistic semi-numerical reionization topology described in Section 2.2.1. For each sightline, we can also uniquely determine the values of our two local pre-quasar summary statistics and . By construction of this sample, the distribution of these statistics follows the topology-informed prior . We run both the global and the local version of our inference pipeline on these spectra, inferring either the global parameters or the local ones . We use this mock ensemble to compare the fits to these spectra in the context of the two different parameterizations, to perform coverage tests, and to compute the corrections required to pass them, as will be discussed in the subsequent section. 2. 2.
A set of mock spectra on a parameter grid in the local parameterization with values of , and . At each location in parameter space, we simulate distinct mocks, and the keep the continuum and noise realization of these mock spectra fixed when varying , and to isolate the impact of these three parameters from those additional sources of stochasticity. Note that realizations of IGM transmission profiles only exist at out of the locations in parameter space since some parameter combinations are physically excluded by the distribution of density fluctuations in the IGM (see e.g. Figure 2), and hence we end up with rather than spectra. The underlying IGM transmission profiles originate from synthetic neutral fraction sightlines generated according to our analytical prescription described in Section 2.2.1. As such, they do not have an associated global IGM neutral fraction value . We therefore only run the local version of our inference pipeline on these spectra to determine the precision with which we can infer , and .
3.5 Coverage tests
A reliable inference framework guarantees that the inferred posterior distribution is statistically faithful. For example, when inferring parameters from different mock spectra, and considering the highest-density region (HDR) of each inferred posterior, we statistically expect the true value to be contained in this HDR for of these mocks, and likewise for any other credibility level . More generally speaking, given a set of mock inferences, we can determine the expected coverage probability corresponding to the credibility level as the number of mocks whose true parameter value is indeed contained in the -th HDR of the posterior distribution, divided by the total number of mocks in the sample.
We can perform a coverage test by explicitly computing for values of in the entire range . This test is passed if we find for all , implying that the inferred posteriors contain the truth exactly as many times as they are statistically expected to. On the other hand, finding indicates that the inferred posteriors are underconfident, i.e., less constraining than they could be. Overconfident, or too narrow posteriors where pose an even larger problem as they can lead us to draw conclusions that are not actually backed by the data. This is caused by flaws in the inference pipeline, such as bugs, inappropriate priors, or, as in the case at hand, an approximate likelihood prescription.
Eliminating such imperfections is essential before quoting any physical parameter constraints. While it would exceed the scope of this work to restructure our pipeline with the help of simulation-based inference to avoid the approximate analytical likelihood expression in Eq. (11), Hennawi et al. (2025b) introduced a principled way of retrospectively broadening the inferred posteriors in such a way that they are guaranteed to pass a coverage test. In essence, the idea is to relabel the -th HDR of a given posterior distribution as the -th HDR according to the mapping determined in the coverage test. Doing so for all credibility levels by construction ensures perfect coverage for the new coverage-corrected posterior distribution. While Hennawi et al. (2025b) outlined the practical procedure for the case where we are provided with MCMC samples from the posterior distribution, we here elaborate on an analogous strategy applicable in the case where the posterior is evaluated on a parameter grid such as the one we are faced with when converting our local parameter constraints to global ones.
Specifically, suppose that the full parameter space spanned by is divided into grid cells with central value and volume . Then the probability mass contained in the -th pixel is
[TABLE]
where for clarity we suppressed all other variables in the argument of the posterior distribution . To determine the highest-density regions, we can now order all grid cells according to the probability mass they contain, resulting in a permutation of the original grid cells , with mapping back the sorted grid cells to their original index. We then define the -th HDR as the region enclosed by the highest-probability cells, i.e.,
[TABLE]
Note that this can be seen as the cumulative distribution function of the reordered cells. This also underlines that summing over all cells, we obtain .
To perform a coverage test, we can now easily determine for each mock the index as the smallest index for which the true parameter is still contained in its -th HDR. The coverage probability corresponding to the credibility level is then simply given by the relative number of objects whose value of (i.e., the size of the smallest HDR that still contains the truth) is less or equal to . In other words, by rank-ordering the values of , and dividing their rank by the total number of mocks , we are immediately provided with the corresponding coverage probabilities . By interpolating among intermediate values, we can obtain a smooth approximation of the coverage curve across the entire range .
Once this curve is determined, we can also make use of it to correct for any potential imperfections in the coverage behavior. This can be straightforwardly accomplished in a backward approach where we again loop over all grid cells ordered by their probability mass , and instead assign them the correct mass dictated by their expected coverage probability . Specifically, when considering the -th cell, we have to set
[TABLE]
Starting at the cell containing the highest probability mass, and subsequently moving to lower-probability cells , this uniquely defines the values of the coverage-corrected posterior distribution at each grid cell.
4 Results: global vs local parameter inference
Equipped with our new local parameterization of quasar IGM damping wings (Section 2), as well as an inference framework to constrain them (Section 3), we now proceed by testing the statistical fidelity of this framework on large ensembles of mock spectra and quantifying the precision with which we can constrain both the local and global astrophysical parameters. Importantly, we also demonstrate how we can relate our local constraints back to the global IGM neutral fraction , obtaining the full four-dimensional, topology-informed posterior on both global and local parameters.
4.1 Inference for a single mock
We start with an illustration of our approach by applying it to the mock spectrum of a quasar which has been shining for , observed at the mid-stages of reionization (at ). The underlying IGM transmission profile is based on a sightline from the semi-numerical reionization topology corresponding to that global neutral fraction value, and we measure and for the local pre-quasar summary statistics of this sightline, corresponding to a typical sightline in an intermediately neutral IGM (c.f. Figure 3). We feed in the same mock spectrum to both our local parameter inference framework introduced in Section 3, and, for comparison, the global version of the pipeline that directly infers the global IGM neutral fraction rather than the local summaries and . Figure 4 depicts the mock spectrum with/without noise in black/purple, with the underlying continuum and noise vector shown in green and yellow, respectively. The upper/lower panels depict the reconstructed spectrum (red) and continuum (blue) inferred in the context of the local/global parameterization. While the median inferred models are remarkably similar, the scatter (marked by the shaded regions), reflecting parameter uncertainty, continuum reconstruction errors, as well as spectral noise of the inferred model spectrum is significantly smaller in the local parameterization as compared to the global one. This is a direct consequence of the fact that the stochasticity of reionization is an intrinsic part of the model in the global parameterization, but has been removed in the local one as demonstrated in Kist et al. (2025b).
The black contours in the corner plot in Figure 5 depict the coverage corrected local constraints that give rise to the model spectrum shown in the upper panel of Figure 4. Here we already marginalized out the seven nuisance parameters associated to the quasar continuum. We can see that the constraints extend over a vast range in the plane with a mild preference towards higher HI column densities and shorter neutral patch distances. The contours in the panel hint at a similar degeneracy between these two parameters in the range as is often observed between the parameters and . This degeneracy arises because both a higher HI column density (or a higher global IGM neutral fraction ), as well as a shorter quasar lifetime are causing a stronger IGM damping wing. At , however, the axis of degeneracy becomes perfectly vertical. This is because at such low column densities, there ceases to be any identifiable damping wing imprint present in the spectrum, and as such, all constraining power with respect to is lost. The size of the proximity zone, on the other hand, places a lower limit on the lifetime of the quasar, since a quasar can never carve out a proximity zone of this size, regardless of the reionization state of the pre-quasar IGM. The lifetime of this object therefore cannot be significantly smaller than such that the lifetime constraint extends along the vertical direction. A similar effect is observed in the plane where neutral patch distances are preferred at most lifetimes , while the contours become vertical at as our sensitivity to more distant neutral patches vanishes. This is because at lifetimes of , a drop in the proximity zone and absorption redward of Lyman- pushes the constraints toward smaller patch sizes. The reason we can constrain this along with is that at fixed , large values of would require rare density flucutations at , high enough to produce these signatures in the spectrum.
Recall that by assuming the constant prior that is only limited by the physical boundaries for these parameters, we initially sampled from the topology-agnostic posterior . Binning these HMC samples into a histogram now allows us to multiply them with the conditional distribution to fold in the topology dependence via Eq. (2.2.3) and obtain the full four-dimensional topology-informed posterior distribution on both global and local parameters, shown as green contours in Figure 5. Since the data was not overly constraining with regards to our local summary statistics, the prior —shown explicitly in purple in the extra panel of the same figure—has a clear impact on the shape of the topology-informed posterior which is shifted notably towards higher HI column densities due to the higher prevalence of such sightlines in the realistic ionization topologies (see Figure 2). The effect on the posterior is somewhat weaker, but here too we can see that the high- tail of the distribution gets suppressed. As these low and high values go along with lifetimes of , the long-lifetime mode of the converted posterior obtains significantly more probability mass than in the unconverted case.
Turning to the resulting constraints, we see that the posterior extends over the entire range of . In essence, this is because almost any ionization topology contains a number of sightlines with local parameter values similar to the one at hand, as seen in Figure 3. The probability mass cumulates around intermediate neutral fractions with a peak very close to the true value of . Note that the and panels of the corner plot can be understood as a reweighted version of the joint distribution of our local summary statistics shown in Figure 2, reweighted according to the posterior distribution as per Eq. (2.2.3).666As per Eq. (5), Eq. (2.2.3) is nothing else but the the product of the joint distribution and the topology-agnostic posterior distribution . The panel recovers the well-known degeneracy between these two parameters, reflecting the fact that both a higher IGM neutral fraction and a shorter quasar lifetime increase the strength of the damping wing signature. We can now also compare the marginal constraints to the ones obtained by inferring these two parameters directly, without invoking our local parameterization, shown as yellow contours in Figure 5. These constraints are in excellent agreement with the green contours obtained by converting our local constraints.
This demonstrates that our local parameterization allows us to recover the consistent global constraints on the timing of reionization as conventionally quoted in the literature. However, our local parameterization comes with a number of advantages: first, the global neutral fraction is not the only parameter constrained by the statistics and . Analyzing an ensemble of sightlines in our local parameterization, and determining the distribution of the summaries and provides us with information about the underlying ionization topology, in addition to the constraints on the timing of reionization. Second, our local framework facilitates model comparison of different reionization models which can come with different ionization topologies. Their effect can easily be folded into our constraints via the conditional distribution that provides the stochastic mapping from local to global parameters. Third, the statistical fidelity of our pipeline with respect to the parameters and improves if we constrain them based on our local parameter constraints rather than infer them directly. We quantify this effect in the subsequent section using the concept of expected coverage probabilities introduced in Section 3.5.
4.2 Coverage test
The mock spectrum discussed in the previous section is part of an ensemble of mock spectra spanning the entire parameter grid (see Section 3.4). Each sightline contains an IGM transmission profile drawn from the realistic reionization topology, and hence we can also determine true and values based on the underlying pre-quasar HI density field of each sightline. Having available the ground truth of all four parameters , , and , we can readily perform coverage tests with respect to the unconverted posterior distributions , as well as the full four-dimensional converted ones , and arbitrary marginals thereof.777Note that before performing any of these coverage tests, we marginalize over all nuisance parameters related to the quasar continuum since we are mainly interested in the coverage on the astrophysical parameters .
We show the results of these coverage tests in Figure 6, where we depict the expected coverage probability as a function of the credibility level , which, in the optimal case, should follow the black dashed one-to-one relation. Firstly, we see that the coverage probability of our local parameter inference (solid black curve) stays consistently below this line, and is therefore overconfident. This is likely because of the Gaussian approximation of the IGM transmission likelihood (Eq. 13), which Hennawi et al. (2025b) identified as the main source of overconfidence in their global parameter inference pipeline. We reproduce their findings by performing a second coverage test on the directly inferred parameters and , based on the same ensemble of mock spectra. The overconfidence of this coverage curve, depicted in yellow in Figure 6 (see also the left-hand panel of Figure 11 in Hennawi et al., 2025b), is somewhat more mild than in the local parameterization (black). The reason for this is likely that the local parameterization has removed a notable degree of scatter from the IGM transmission profiles. While this additional amount of scatter acts as a major additional source of uncertainty for the final parameter constraints (Kist et al., 2025c), it also aids the inference by gaussianizing the likelihood function according to the central limit theorem. Our Gaussian approximation to the IGM transmission likelihood (which applies in both cases) is therefore slightly better justified in the global parameterization than in the local one, leading to the somewhat better (but still suboptimal) coverage behavior.
The coverage of the four-dimensional, converted posterior distribution (white curve) looks almost identical to the unconverted, three-dimensional one since it remains affected by the overconfidence with respect to the local parameters and . On the other hand, after marginalizing out these two parameters, we are left with a nearly perfect coverage curve on the remaining two parameters and (green curve). Notably, the coverage behavior is even better than if we had inferred these two parameters directly. This shows that our local parameter inference pipeline can be advantageous even if all interest is geared towards global constraints on the timing of reionization, disregarding the local summaries and as nuisance parameters. This result is somewhat surprising given the overconfidence when inferring the local parameters on their own. However, it is important to realize that the coverage on other parameters, subsequently obtained from those, is not necessarily similarly bad.
In the case at hand, note that the mapping from local to global parameters is probabilistic, and therefore in particular highly non-injective, i.e., many local parameter combinations can get mapped to the same global IGM neutral fraction (c.f. Figure 3). Even if there are imperfections in the local and constraints, these parameters can still get mapped to the ’correct’ global value. Specifically, this appears to be the case here to such a high degree that the converted constraints are statistically more faithful than the directly inferred ones. An additional factor that likely contributes is that binning the conditional distribution in order to convert the constraints via Eq. (2.2.3) smooths out the contours and thus improves the coverage to a certain extent, whereas such a binning step is not required when inferring and directly.
4.3 Inference precision
Having investigated the coverage behavior of our inference framework, we now proceed by quantifying its precision. We do so by considering the one-dimensional marginal posterior distribution of a given astrophysical parameter, and computing its width
[TABLE]
as a measure of the error on the parameter , where is the -th percentile of the respective marginal posterior distribution. Before determining these percentiles, we perform a coverage test and reweigh the inferred posterior distribution according to the prescription discussed in Section 3.5 and Hennawi et al. (2025b) to ensure that we do not quote overly optimistic precision values.
4.3.1 Local parameterization
We start by investigating the precision we can achieve on , and when performing the inference in our new local parameterization. Here we only determine the inference precision in the region of parameter space where a non-negligible damping wing imprint is present, i.e., where we can expect at least some degree of sensitivity to these parameters. In other words, we choose to concentrate our computational resources for this exercise to regions where our constraints are not fully prior-dominated. These are sightlines with comparably high HI column densities and shorter distances to the first neutral patch. We therefore consider a grid in parameter space with values of and , i.e., mock ensemble (ii) described in Section 3.4. In what follows, the configuration with and can be seen as representative of the remaining parts of parameter space where and/or . Note that due to the physical constraints on the possible combinations, models only exist for out of the points on this grid. Recall that it does not take a topology-informed prior to exclude the remaining parameter combinations. These are physically excluded based on the topology-independent distribution of density fluctuations in the IGM.
We add a third grid dimension by considering three different lifetime values of and . At each location in parameter space, we consider distinct mock spectra, and we average the inferred precision values over these objects. To further suppress variations throughout parameter space due to other sources of stochasticity, we consider the same continuum draws and noise realizations at each location in parameter space. Overall, this amounts to a set of mock spectra. Note that due to the fact that these parameter values do not span the full prior range of the three astrophysical parameters , we do not determine the coverage behavior on this set of mocks but rather on the one considered in Section 4.2. This is possible since all that is required for the reweighting procedure outlined in Section 3.5 is the functional relationship between credibility levels and expected coverage probability . Practically, this means we consider the unconverted local parameter coverage (black curve) from Figure 6, and use it to reweigh our samples obtained from the new mock ensemble introduced in this section. We then compute the inference precision based on the percentiles of the reweighted samples according to Eq. (17).
The averages of these precision values as a function of the astrophysical parameter values are plotted in Figure 7 where we show from top to bottom the inference precision on , and , respectively. The three columns represent the three different mock quasar lifetimes of and , respectively, and each panel itself depicts the precision in parameter space. The annotated values are the mean precision of all mocks in the respective panel.
We begin by analyzing the lifetime precision for which we do not identify a notable dependence on the local parameters and . The mean precision values are at and at , decreasing to at , as seen in the upper row of Figure 7. This enhanced precision at long lifetimes is in line with the trend identified in Kist et al. (2025c) in the context of the global parameterization, attributed to the thermal proximity effect due to Helium II ionization. This becomes apparent for quasars with long lifetimes and thus a significantly extended Helium ionization front where the associated thermal heating has enhanced the strength of the IGM transmission peaks, making such lifetimes easier to identify through this distinct feature. We refrain from a quantitative comparison of our lifetime precision values to those quoted in Kist et al. (2025c) due to the fact that due to the different parameterizations, an apples-to-apples comparison of the same regions of parameter space is challenging, and, more importantly, we are investigating here the precision of our topology-agnostic constraints, whereas the global constraints in Kist et al. (2025c) are inevitably topology-informed. Instead, we proceed in the subsequent section by performing a comparison of the precision values we can obtain in global parameter space after conversion of our local constraints versus directly inferring and .
For now, proceeding to the HI column density inference precision , we observe similar trends as found by Kist et al. (2025c) for the global IGM neutral fraction , that is, an increasingly high precision the stronger the IGM damping wing. This is unsurprising given that the strongest imprints can most easily be disentangled from the intrinsic continuum of the quasar. This trend is clearly seen when focusing on rows of fixed at short to intermediate quasar lifetimes in the first two panels of the middle row of Figure 7. Vice versa, at fixed (i.e., in a given column of one these panels), the inference precision on this parameter appears to deteriorate the closer the location of the first neutral patch to the quasar. A possible reason for this is the somewhat increased scatter of IGM transmission profiles in the local parameterization at small values (see Figures 8 and 9 in Kist et al., 2025b) which makes it harder to reconstruct the true column density of a given sightline.
Both aforementioned trends with and cease to exist when the lifetime of the quasar is long () where we find a uniform inference precision across the entire range of parameter space considered in this figure. This is likely because the damping wing imprint is so weak at these lifetimes that it can hardly be disentangled from the intrinsic spectrum of the quasar, even at the highest HI column densities .
The most significant trends are seen for the inference precision on the distance of the quasar to the first neutral patch, depicted in the bottom row of Figure 7. At the shortest lifetime of , we can identify a highly pronounced trend of enhanced precision at higher HI column densities and shorter neutral patch distances , improving from down to . These high precision values can be explained by the fact that the location of the ionization front for low-lifetime objects coincides well with the pre-quasar neutral patch location since such quasars have not yet carved out an ionized patch that extends significantly beyond this location. As a result, can easily be reconstructed from the spectrum of such objects. However, if the patch location is too far away, the decrease of the quasar’s photoionization rate according to the inverse-square law, , suppresses all transmission already at a distance closer to the quasar than , and hence we lose sensitivity to this parameter (see also Figure 8 in Kist et al., 2025b).
At longer quasar lifetimes, inference precision on deteriorates since most such objects have ionized away the first pre-quasar neutral patch, making it significantly harder (if not even entirely impossible) to reconstruct its location. At , a relatively high precision is retained for the and models. These are parameter combinations where the location of the ionization front after still coincides well with the location of the first pre-quasar neutral patch (compare also Figure 9 in Kist et al., 2025b), and as such, is still reconstructible at a reasonable precision. On the other hand, after shining for , the quasar has ionized away so much neutral material that virtually any information about the location of the pre-quasar neutral patch is lost, regardless of the location in parameter space.
In summary, we have found that in certain regions of parameter space, we are sensitive to all three parameters of our local IGM damping wing parameterization, demonstrating its value for astrophysical parameter inference. Specifically, we saw that we can reconstruct the quasar lifetime to , the HI column density to , improving down to when the damping wing is strong, and the neutral patch distance up to at short quasar lifetimes of , whereas sensitivity gets lost the longer the quasar has been shining.
4.3.2 Global parameterization
As pointed out in the previous section, an apples-to-apples comparison of the inference precision achieved in the local parameterization to that quoted in Kist et al. (2025c) in the context of the conventional global parameterization is not possible. To perform such a comparison, we instead consider mock ensemble (i) described in Section 3.4 where the sightlines originate from realistic semi-numerical ionization topologies and we can compare the precision achieved on and by converting the local constraints to these global parameters versus inferring them directly.
Figure 8 depicts these precision values obtained on in the upper row and on in the lower row, comparing the converted local constraints in the left column to the directly inferred global ones in the right one. Each panel by itself shows the precision values in the parameter plane, where each pixel corresponds to the average precision of mocks from our mock object ensemble. Note that there is a direct correspondence between the two right-hand panels and the bottom-right panels of Figures 9 and 10 in Kist et al. (2025c). We average the precision over mocks in each pixels to suppress sightline-to-sightline stochasticity as far as possible. Note that to that end, our mock ensemble considered in the previous section had fixed continua and noise draws across the entire range of parameter space. This is not the case in this section as we are only interested in the differences in inference precision based on the two parameterizations, so a one-to-one comparison between fixed pixels in the left and right panels is still possible.
What we find when performing this comparison is a near perfect agreement between the precision values obtained in the two parameterizations. Small differences only become apparent in precision towards the (high-, low-) end where the global constraints become somewhat more precise. This is a consequence of the binning we adopt when converting the local to the global constraints which are limited by the grid spacing of . On the other hand, the converted constraints are slightly more precise in regions of low precision (e.g. precision at the long-lifetime end , or precision at intermediate lifetimes ). This likely results from the stronger coverage corrections that have to be applied in the context of the global parameterization where the inference is somewhat overconfident.
Besides these minor differences, our main conclusion is that the inference precision on and coincides remarkably well, regardless of the parameterization that the inference was originally performed in. This demonstrates that we are not losing any information on the global timing of reionization by first inferring local constraints and subsequently tying them to a global reionization model. Note that on the other hand, we also cannot expect significantly tighter constraints than from the direct global parameter inference since the conversion step necessarily implies that we are again folding in the stochasticity of reionization that our local parameterization has removed from the inference task itself. However, our new parameterization allows us to gain hitherto unused information about the local ionization topology in front of a quasar while still retaining the same precision on the parameters and when inferring them directly in the context of the global parameterization.
5 Conclusions
IGM damping wings towards quasars are a unique probe not only of the history of reionization but also its topology, and Kist et al. (2025b) introduced a new set of local summary statistics encapsulating all this information. Here we presented a fully Bayesian inference framework that allows us to extract this information in a topology-independent fashion. These constraints are informative about the local ionization topology before the quasar started shining, and, when tied to a specific reionization model, also about the global timing of reionization.
To establish this connection, we introduced a probabilistic framework that allows us to map our local damping wing statistics, consisting of a Lorentzian-weighted HI column density as well as the distance of the quasar to the first neutral patch in the pre-quasar topology, to the global volume-averaged IGM neutral fraction . The virtue of our approach is that the two local summaries and , along with the lifetime of the quasar, can be inferred independently of any assumptions about the reionization model. We demonstrated that all these assumptions can be encoded in a non-trivial prior which can be folded in subsequently, facilitating the comparison of different reionization models.
In addition, we found that, after marginalizing over the local constraints, this two-step procedure improves the statistical fidelity of our inference pipeline. The inference of the local damping wing statistics themselves shows a certain degree of overconfidence, but we correct for this by broadening the inferred posteriors in a principled way before quoting actual constraints. In the future, simulation-based inference approaches will be key to also capture non-Gaussianities in the IGM transmission likelihood, and hence make use of the full information contained in the spectra while removing the need for coverage correction altogether (Chen, 2024).
Based on a large sample of mock spectra, we quantified that precision at which we can infer the three astrophysical model parameters, and we found that we can constrain the quasar lifetime to , the HI column density to , and the distance to the first neutral patch to for model parameter combinations that would imprint a significant damping wing upon the spectrum of the quasar. Furthermore, we found that after tying our local constraints to a given reionization model, our precision on the global IGM neutral fraction and the quasar lifetime is on par with the precision achieved when inferring these parameters directly in the context of the conventional, global parameterization, while removing the need for coverage corrections. This advantage comes at the price of a somewhat increased computational cost due to the increased dimensionality of parameter space which also increases the number of required simulation models.
The biggest virtue of adopting our approach, though, is that it provides model-independent, physical information about the local ionization topology in front of a given quasar. If concerned with multiple sightlines at a similar redshift, setting up a Bayesian hierarchical model will allow us to combine these topology-agnostic local constraints to a topology-informed constraint not only on the global timing of reionization but also its topology (Sharma et al., 2025). In addition, the local information we gain through these statistics will allow us to draw novel connections to the 21 cm power spectrum, and the ionized bubble sizes measured in studies of Lyman- emission from galaxies. We will further explore these connections in future work.
Acknowledgements
We acknowledge helpful conversations with the ENIGMA group at UC Santa Barbara and Leiden University, especially with Da-Ming Yang about the low-redshift continuum data. This work made use of NumPy (Harris et al., 2020), SciPy (Virtanen et al., 2020), JAX (Bradbury et al., 2018), NumPyro (Bingham et al., 2018; Phan et al., 2019), sklearn (Pedregosa et al., 2011), Astropy (Astropy Collaboration et al., 2013, 2018, 2022), PypeIt (Prochaska et al., 2020), SkyCalc_ipy (Leschinski, 2021), h5py (Collette, 2013), Matplotlib (Hunter, 2007), corner.py (Foreman-Mackey, 2016), and IPython (Pérez & Granger, 2007). TK and JFH acknowledge support from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No 885301). JFH acknowledges support from NSF grant No. 2307180.
Data Availability
The derived data generated in this research will be shared on reasonable requests to the corresponding author.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Almgren et al. (2013) Almgren A. S., Bell J. B., Lijewski M. J., Lukić Z., Van Andel E., 2013, Ap J , 765, 39 · doi ↗
- 2Astropy Collaboration et al. (2013) Astropy Collaboration et al., 2013, A&A , 558, A 33 · doi ↗
- 3Astropy Collaboration et al. (2018) Astropy Collaboration et al., 2018, AJ , 156, 123 · doi ↗
- 4Astropy Collaboration et al. (2022) Astropy Collaboration et al., 2022, Ap J , 935, 167 · doi ↗
- 5Bañados et al. (2018) Bañados E., et al., 2018, Nature , 553, 473 · doi ↗
- 6Bañados et al. (2025) Bañados E., et al., 2025, MNRAS , 542, 1088 · doi ↗
- 7Becker et al. (2024) Becker G. D., Bolton J. S., Zhu Y., Hashemi S., 2024, MNRAS , 533, 1525 · doi ↗
- 8Bingham et al. (2018) Bingham E., et al., 2018, ar Xiv e-prints , p. ar Xiv:1810.09538 · doi ↗
