A Tapered Gridded Estimator (TGE) for the Multi-Frequency Angular Power Spectrum (MAPS) and the Cosmological HI 21-cm Power Spectrum
Somnath Bharadwaj, Srijita Pal, Samir Choudhuri, Prasun Dutta

TL;DR
This paper introduces a new tapered gridded estimator (TGE) for accurately estimating the multi-frequency angular power spectrum and the cosmological HI 21-cm power spectrum from radio observations, even with significant data flagging.
Contribution
The work extends an existing visibility-based estimator to handle multi-frequency data, enabling improved power spectrum estimation from HI 21-cm observations.
Findings
Estimator recovers P(k) with 5-20% accuracy.
Effective even with 80% frequency channel flagging.
Validated using GMRT simulation data.
Abstract
In this work we present a new approach to estimate the power spectrum of redshifted HI 21-cm brightness temperature fluctuations. The MAPS completely quantifies the second order statistics of the sky signal under the assumption that the signal is statistically homogeneous and isotropic on the sky. Here we generalize an already existing visibility based estimator for , namely TGE, to develop an estimator for . The 21-cm power spectrum is the Fourier transform of with respect to , and we use this to estimate . Using simulations of GMRT observations, we find that this estimator is able to recover with an accuracy of over a reasonably large range even when the data in randomly chosen frequency channels isβ¦
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Tapered Gridded Estimator (TGE) for the Multi-Frequency Angular Power Spectrum (MAPS) and the Cosmological
HI 21-cm Power Spectrum
Somnath Bharadwaj1, Srijita Pal1, Samir Choudhuri2 and Prasun Dutta3
1 Department of Physics, & Centre for Theoretical Studies, IIT Kharagpur, Kharagpur 721 302, India
2 National Centre For Radio Astrophysics, Post Bag 3, Ganeshkhind, Pune 411007, India
3 Department of Physics, IIT (BHU), Varanasi 221005, India Email:[email protected]
Abstract
In this work we present a new approach to estimate the power spectrum of redshifted HI 21-cm brightness temperature fluctuations. The MAPS completely quantifies the second order statistics of the sky signal under the assumption that the signal is statistically homogeneous and isotropic on the sky. Here we generalize an already existing visibility based estimator for , namely TGE, to develop an estimator for . The 21-cm power spectrum is the Fourier transform of with respect to , and we use this to estimate . Using simulations of GMRT observations, we find that this estimator is able to recover with an accuracy of over a reasonably large range even when the data in randomly chosen frequency channels is flagged.
keywords:
methods: statistical, data analysis - techniques: interferometric- cosmology: diffuse radiation
1 Introduction
Measurements of the cosmological HI 21-cm power spectrum can be used to probe the large scale distribution of neutral hydrogen (HI) across a large redshift range from the Dark Ages to the Post-Reionization Era (e.g. BA5; furla06; morales10; prichard12; mellema13). Being very faint in nature, the 21-cm signal is buried in foregrounds which are four to five orders of magnitude larger than the expected signal (shaver99; santos05; ali; bernardi09; ghosh150; iacobelli13; samir17a). There are several ongoing and future experiments, e.g. Donald C. Backer Precision Array to Probe the Epoch of Reionization (PAPER111http://astro.berkeley.edu/dbacker/eor, parsons10), the Low Frequency Array (LOFAR222http://www.lofar.org/, haarlem; yata13), the Murchison Wide-field Array (MWA333http://www.mwatelescope.org, bowman13; tingay13), the Giant Metrewave Radio Telescope (GMRT, swarup) the Square Kilometer Array (SKA1 LOW444http://www.skatelescope.org/, koopmans15) and the Hydrogen Epoch of Reionization Array (HERA555http://reionization.org/, deboer17) which are aiming to detect the 21-cm power spectrum from the Epoch of Reionization (EoR).
The biggest challenge for a detection of the redshifted 21-cm signal are the foregrounds which include point sources, the diffuse Galactic synchrotron emission, the free-free emission from our Galaxy and external galaxies. Various techniques have been proposed to overcome this issue. The foreground subtraction technique proposes to subtract a foreground model from the visibility data or the image and use the residual data to detect the 21-cm power spectrum (jelic08; bowman09; paciga11; chapman12; trott1; paciga13; trott16). Considering , the cylindrical power spectrum of the 21-cm brightness temperature fluctuations, the foregrounds are expected to be primarily confined to a wedge in the plane. Here, ans refer to the components of the 3-dimensional wave vector perpendicular and parallel to the line of sight direction respectively. The foreground avoidance technique proposes to use the region outside this βForeground Wedgeβ to estimate the 21-cm power spectrum (adatta10; parsons12; vedantham12; pober13; thyag13; parsons14; pober14; liu14a; liu14b; dillon14; dillon15; zali15).
A large variety of estimators have been proposed and applied to measure the power spectrum of the brightness temperature fluctuations using the visibility data measured in radio interferometric observations. Image-based estimators (seljak97; paciga13) have the deconvolution error which arises during image reconstruction, and this may affect the estimated power spectrum. There are a few other techniques, like the Optimal Mapmaking Formalism (morales09) where the deconvolution errors can be avoided during imaging. It is possible to overcome this issue by estimating the power spectrum directly from the measured visibilities (morales05; mcquinn06; pen09; liu12; parsons12; liu14a; liu14b; dillon15; trott16). liu16 have proposed an estimator which uses the spherical Fourier-Bessel basis to account for sky curvature. In addition to the sky signal, the visibilities (or the image) also have a noise contribution, and the noise bias is an important issue for power spectrum estimation. For example, zali15 have divided the data sets into even and odd LST bins and have correlated these to avoid introducing a noise bias. This approach however does not utilize the full signal available in the data. The foreground contributions from the outer regions of the telescopeβs field of view (including the side-lobes) pose a severe problem for detecting the cosmological 21-cm signal (pober16). In this paper we develop on the visibility based Tapered Gridded Estimator (TGE; samir14,samir16b, hereafter Papers I and II respectively) whose salient features we summarize as follows. First, it uses the data to internally estimate the noise bias and subtracts this out to provide an unbiased estimate of the power spectrum. Second, it deals with the gridded visibilities which makes it computationally efficient. Third, it tapers the sky response to suppress the contribution from the outer regions of the telescopeβs field of view.
Nearly all the estimators for , including the 3D TGE (Paper II), consider a Fourier transform of the measured visibilities along the frequency axis to obtain the visibilities in delay space (morales04). This is used to estimate . A difficulty arises if the data is missing or flagged in a few frequency channels in which case the delay channel visibilities and the estimated power spectrum are both modified by a convolution with the Fourier transform of the frequency sampling function. Missing or flagged channels are quite common in any typical observation due to a variety of reasons including man made radio frequency interference (RFI). The CHIPS estimator developed by trott16 overcomes this problem by using Least-Squares Spectral Analysis (LSSA) to evaluate . However this needs to be applied individually for each baseline, and the entire process could be computationally expensive for large data volumes. In this paper we propose an alternative approach to estimate which is able to handle the problem of missing or flagged data with relative ease. Another point to note is that the earlier estimators all introduce a frequency filter which smoothly goes to zero at the two edges of the frequency band. This is introduced to avoid a discontinuity at the edges of the band, however it results in the loss of some signal. Such a filter is not needed in the new estimator proposed here.
The multi-frequency angular power spectrum (MAPS; maps,mondal2018) completely quantifies the second order statistics of the sky signal under the assumption that the signal is statistically homogeneous and isotropic on the sky. This however does not assume that the signal is ergodic or statistically homogeneous along the frequency axis. We have where if we impose the additional condition that the signal is ergodic along frequency. The 3D 21-cm power spectrum is the Fourier transform of . In the new approach presented here we first estimate and use the binned to estimate . Even if some channels are missing, it is quite possible that the frequency separations are all present in the data. In this case it is quite straight forward to evaluate through a Fourier transform of . More sophisticated techniques like the LSSA can be used in case some are missing, however this needs to be applied to the binned and the task is not computationally expensive.
The MAPS has been used to quantify the statistical properties of the background radiation in GMRT observations at 150 MHz (ali; ghosh150) and 610 MHz (ghosh1; ghosh2). The HI signal contribution to the measured is expected to decorrelate rapidly when is increased whereas the foreground contribution is expected to remain correlated for large separations. This property was used (ghosh2) to model and remove the foreground contribution and obtain a residual which is consistent with noise. It was thereby possible to place an observational limit on the HI 21-cm power spectrum at . The estimator used in these earlier works individually correlates pairs of visibilities to estimate , a technique which is computationally expensive. The 2D TGE (Paper II) presents an efficient technique to estimate the angular power spectrum . In Section 2. of this paper we have generalized this earlier work to develop an estimator for the MAPS . In Section 3. we present how is obtained from the estimated . Section 4. presents the Simulations which we have used to validate our estimator, Section 5. presents the Results and Section 6. presents the Discussion and Conclusions.
We have used the cosmological parameters from the (Planck + WMAP) best-fit CDM cosmology (ade15) throughout this paper.
2 An overview of the Tapered Gridded Estimator
The 2D TGE, presented in Paper II considers radio-interferometric observations at a single frequency and uses the measured visibilities to estimate the angular power spectrum of the background radiation at the frequency . Here refers to the -th visibility measurement with a corresponding baseline . The measured visibilities can be expressed as
[TABLE]
Here, the first term is the sky signal which is the convolution of and where these are the Fourier transforms of the primary beam and the temperature fluctuations in the sky respectively, and is the Planck function in the Rayleigh-Jeans limit. The second term is the system noise contribution.
In order to taper the sky response, the measured visibilities are convolved with a function which is the Fourier transform of a window function which falls off to a value close to zero well before the first null of the telescopeβs primary beam pattern (Paper I). Further, in order to reduce the computation, the convolved visibilities are evaluated on a grid in space using
[TABLE]
where the βcβ in refers to βconvolvedβ and refers to different grid points with corresponding baselines . The sky response of is tapered with the window function . Here we have used where the value of is chosen so as to suppress the contribution from the outer regions and sidelobes of the telescopeβs primary beam pattern (Figure 1 of samir16a). For comparison, the full width half maxima of the GMRT primary beam pattern may be estimated to be where is the antenna diameter.
The convolved gridded visibilities can be expressed as
[TABLE]
where
[TABLE]
is an effective βgridding kernelβ, and
[TABLE]
is the baseline sampling function of the measured visibilities.
The 2D TGE estimator is defined as
[TABLE]
with where , and denotes an ensemble average over multiple realizations of the sky brightness temperature fluctuations which are recorded in the visibilities. The second term in the brackets in eq.Β (6) is introduced to subtract out the noise bias contribution which arises due to the correlation of a visibility with itself. is a normalization factor which we shall discuss later. Simulations show that the 2D TGE provides an unbiased estimate of the angular power spectrum (Paper II) while effectively suppressing the contribution from the sidelobes and outer regions of the telescopeβs primary beam (samir17b).
2.1 Calculation
As discussed in Paper II, the normalization constant can be written as,
[TABLE]
where,
[TABLE]
and
[TABLE]
The values of (eq. 7) depend on the baseline distribution (eq. 5) and the form of the tapering function , and it is necessary to calculate at every grid point in the plane. Paper I presents an analytic approximation to estimate . While this has been found to work very well in a situation where the baselines have a nearly uniform and dense coverage (Fig. 7 of Paper I), it leads to being overestimated in a situation where we have a sparse and non-uniform coverage. Paper II presents a different method to estimate which has been found to work well even if the coverage is sparse and non-uniform .
We now briefly present how the normalization constant is calculated for estimation in eq.Β (6) . As discussed in Paper II, we proceed by constructing random realizations of simulated visibilities corresponding to a situation where the sky signal has an unit angular power spectrum (UAPS) . The simulated visibilities have exactly the same baseline distribution as the actual observed visibilities. We then have (eq. 6)
[TABLE]
which allows us to estimate . We average over independent realizations of the UPAS to reduce the statistical uncertainty.
2.2 Binning
The estimator provides an estimate of at different grid points on the plane. We have binned the estimates in order to increase the signal to noise ratio and also reduce the data volume. The signal is assumed to be statistically isotropic on the sky whereby it is independent of the direction of . This allows us to average the estimates within an annular region on the plane. We define the binned Tapered Gridded Estimator for bin using
[TABLE]
where refers to the weight assigned to the contribution from any particular grid point. The choice assigns equal weightage to the value of estimated at each grid point, whereas corresponds to a situation where the grid points which have a denser baseline sampling (less system noise) would be given a larger weightage. The former would be desireable if one wishes to optimize with respect to the cosmic variance whereas the latter would be preferred to optimize with respect to the system noise contribution. The optimum choice of to maximize the signal to noise ratio would depend on the window function and the baseline distribution, and we plan to address this in future.
The binned estimator has an expectation value
[TABLE]
where is the average angular power spectrum at
[TABLE]
which is the effective angular multipole for bin .
3 The multi-frequency angular power spectrum
The multi-frequency angular power spectrum (maps) characterizes the joint frequency and angular dependence of the statistical properties of the background sky signal. We decompose the brightness temperature fluctuations in terms of spherical harmonics using
[TABLE]
and define the multi-frequency angular power spectrum (hereafter MAPS) as
[TABLE]
As discussed in mondal2018, we expect to entirely quantify the second order statistics of the redshifted 21-cm signal.
We now proceed to define a visibility based Tapered Gridded Estimator (TGE) for . We generalize the analysis to consider visibility measurements at multiple frequency channels , each of width , with channels that span a bandwidth . Here we allow for the possibility that several of the data are bad or missing. We assume that such data has been identified and flagged, and this information is stored using a flagging variable which has value [math] for the flagged data and value otherwise. We then have
[TABLE]
which allows us to define the Tapered Gridded Estimator (TGE) for as
[TABLE]
where denotes the real part, is a Kronecker delta i.e. it is necessary to subtract the noise bias only when the two frequencies are the same , and the noise in the visibility measurements at two different frequencies are uncorrelated.
The TGE defined in eq.Β (17) provides an unbiased estimate of at the angular multipole i.e.
[TABLE]
We use this to define the binned Tapered Gridded Estimator for bin
[TABLE]
where refers to the weight assigned to the contribution from any particular grid point . For the analysis presented in this paper we have used the weight which roughly averages the visibility correlation across all the grid points which are sampled by the baseline distribution. The binned estimator has an expectation value
[TABLE]
where is the bin averaged multi-frequency angular power spectrum (MAPS) at
[TABLE]
which is the effective angular multipole for bin .
Paper II describes how we have estimated using UAPS simulations in the context of observations at a single frequency. This has also been summarized in Section 2 of this paper. Here we have extended the earlier analysis to simulate visibilities for which we have an unit multi-frequency angular power spectrum . We also apply the same flagging variable as the actual data to the simulated data. Using the simulated visibilities and the actual flagging variable in eq.Β (17), we have an estimate of . We have used multiple realizations of the simulations to reduce the uncertainty in the estimated values of .
We note that the estimator presented here does not take into account the fact that the baselines (where is the antenna spacing) and the primary beam pattern both change with frequency and these are held fixed at the values corresponding to the central frequency . While this may not have a very significant effect on the recovered 21-cm power spectrum, it is very important for the foregrounds where this leads to the foreground wedge (eg. adatta10; parsons12; vedantham12). We note that the frequency dependence of the baselines has been included in earlier versions of the MAPS estimator (ali; ghosh1; ghosh150) which did not incorporate gridding and tapering. It is possible to incorporate the frequency dependence of the baselines in the TGE by suitably scaling the baselines at the time of convolution and gridding (eq. 16), and we plan to address this in future work.
4 Estimating
In order to estimate the 3D power spectrum we assume that the redshifted 21-cm signal is statistically homogeneous (ergodic) along the line of sight (e.g. mondal2018). We then have where i.e. the statistical properties of the signal depends only on the frequency separation and not the individual frequencies. In the flat sky approximation, the power spectrum of the brightness temperature fluctuations of the redshifted 21-cm signal is the Fourier transform of , and we have (maps)
[TABLE]
where and are the components of respectively parallel and perpendicular to the line of sight, is the comoving distance corresponding to the central frequency of our observations and is evaluated at . A brief derivation of eq.Β (22) is also presented in the Appendix of mondal2018. In this paper we have used (eq.Β 22) to estimate from the MAPS .
First we impose the ergodic assumption on which has been estimated from the visibility data using eq.Β (17) and binned using eq.Β (19,20 and 21). For a fixed and , we average over all the values for which to obtain . We then have where with . We see that is a periodic function of with period . We use the discrete Fourier transform
[TABLE]
with to estimate which is already binned in . We have further binned in to obtain the Spherical Power Spectrum , and the Cylindrical Power Spectrum .
5 Simulations
We have carried out simulation to validate the estimator presented here. We have simulated hours of Giant Meterwave Radio Telescope (swarup) observations with channels of width spanning and integration time towards RA=10h46m00s and DEC=59*β*0059. We note that the EoR 21-cm signal is not expected to be ergodic over the bandwidth considered here due to the Light Cone effect (mondal2018). However, we have not considered this effect here and assumed that the signal is ergodic. The sky signal, we assume, is entirely the redshifted HI 21-cm emission whose brightness temperature fluctuations are characterized by the 3D power spectrum . For the purpose of this paper we have arbitrarily chosen the values and . We have followed the procedure outlined in Section 4 of samir16b to simulate visibilities corresponding to different statistically independent realizations of the brightness temperature fluctuations.
In addition to the sky signal, the visibilities also contain a system noise contribution. We have modelled the system noise contribution to the visibilities as Gaussian random variables whose real and imaginary parts both have zero mean and variance . For comparison we have also estimated which is the same quantity for the simulated sky signal contribution. The ratio gives an estimate of the relative contribution of the system noise with respect to the sky signal. In our simulations we have used which corresponds to a situation where the noise contribution to an individual visibility is times the sky signal contribution. We have generated statistically independent realizations of both the sky signal and the system noise. The resulting statistically independent realizations of the simulated visibilities were used to estimate the mean and errors for the results presented below. We have considered simulations both with and without flagging. For each baseline we have generated random integers in the range and flagged the corresponding channels. We have carried out simulations for various values of (the fraction of flagged channels) in the range .
We note that the frequency dependence of the baselines and the primary beam pattern have both been incorporated in the simulated visibilities.
