Unified Radio Interferometric Calibration and Imaging with Joint Uncertainty Quantification
Philipp Arras, Philipp Frank, Reimar Leike, R\"udiger Westermann,, Torsten En{\ss}lin

TL;DR
This paper introduces a Bayesian algorithm that unifies radio interferometric calibration and imaging, providing both sky brightness estimates and joint uncertainty quantification, enhancing the reliability of radio astronomical data analysis.
Contribution
The paper presents a novel Bayesian framework using information field theory and MGVI that jointly calibrates and images radio interferometric data with uncertainty estimation.
Findings
Provides joint calibration and imaging with uncertainty quantification.
Implemented as a practical algorithm for radio interferometry.
Focuses on direction-independent antenna calibration.
Abstract
The data reduction procedure for radio interferometers can be viewed as a combined calibration and imaging problem. We present an algorithm that unifies cross-calibration, self-calibration, and imaging. Being a Bayesian method, that algorithm does not only calculate an estimate of the sky brightness distribution, but also provides an estimate of the joint uncertainty which entails both the uncertainty of the calibration and the one of the actual observation. The algorithm is formulated in the language of information field theory and uses Metric Gaussian Variational Inference (MGVI) as the underlying statistical method. So far only direction-independent antenna-based calibration is considered. This restriction may be released in future work. An implementation of the algorithm is contributed as well.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24| 2 | 2 | 5 | 1 | 3 | 4 | |||
|---|---|---|---|---|---|---|---|---|
| 1.5 | 1 | 1 | 2 | 20 | ||||
| 1.5 | 1 | 1 | 2 | 20 |
| 2 | 2 | 2 | 1 | 2 | 4 | 1 | ||
|---|---|---|---|---|---|---|---|---|
| 1.5 | 1 | 1 | 2 | 20 | ||||
| 1.5 | 1 | 1 | 2 | 20 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: Max-Planck Institut für Astrophysik, Karl-Schwarzschild-Str. 1, Garching, Germany22institutetext: Ludwig-Maximilians-Universität München (LMU), Geschwister-Scholl-Platz 1, München, Germany33institutetext: Technische Universität München (TUM), Boltzmannstr. 3, 85748 Garching, Germany
Unified radio interferometric calibration and imaging
with joint uncertainty quantification
Philipp Arras Unified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantification
Philipp Frank Unified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantification
Reimar Leike Unified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantification
Rüdiger Westermann Unified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantification
Torsten A. Enßlin Unified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantificationUnified radio interferometric calibration and imaging with joint uncertainty quantification
(Received ¡date¿/ Accepted ¡date¿)
The data reduction procedure for radio interferometers can be viewed as a combined calibration and imaging problem. We present an algorithm that unifies cross-calibration, self-calibration, and imaging. Because it is a Bayesian method, this algorithm not only calculates an estimate of the sky brightness distribution, but also provides an estimate of the joint uncertainty which entails both the uncertainty of the calibration and that of the actual observation. The algorithm is formulated in the language of information field theory and uses Metric Gaussian Variational Inference (MGVI) as the underlying statistical method. So far only direction-independent antenna-based calibration is considered. This restriction may be released in future work. An implementation of the algorithm is contributed as well.
Key Words.:
techniques: interferometric – methods: statistical – methods: data analysis – instrumentation: interferometers
1 Introduction
Radio astronomy is thriving. Super-modern telescopes such as MeerKAT, the Australian Square Kilometer Array Pathfinder (ASKAP), the Very Large Array (VLA), and the Atacama Large Millimeter Array (ALMA) are operating and the Square Kilometer Array (SKA) is in the planning stages. All these telescopes provide high-quality data on an unprecedented scale and much progress is being made instrumental-wise, which facilitates enormous improvements in sensitivity and survey speed.
Impressed by these novel facilities we would like to turn our attention to the calibration and imaging algorithms that are fed by the data from these telescopes. The amount of scientific insight that can possibly be extracted from a given telescope is limited by the capability of the employed data reduction algorithm. We suggest that there is room for improvement regarding the calibration and imaging procedure: the most widely applied algorithms view calibration and imaging as separate problems and are not able to provide uncertainty information. The latter is desperately needed to quantify the level of trust a scientist can put on any result based on radio observations. Furthermore, a statistical sound confrontation of astrophysical models to radio data requires reliable uncertainty quantification. Treating calibration and imaging as separate steps ignores their tight interdependence.
The algorithmic idea presented in this work is an advancement of the original resolve algorithm (Radio Extended SOurces Lognormal deconvolution Estimator; Junklewitz et al. 2016) and may retain its name. The resolve algorithm is formulated in the language of information field theory (IFT; Enßlin et al. 2009; Enßlin 2018), which is a view on Bayesian statistics applicable wherever (physical) fields are supposed to be inferred. From a Bayesian point of view the question when reducing radio data is the following: Given prior knowledge as well as measurement information about the brightness distribution of a patch of the sky, what knowledge does the observer have after obtaining the data? This question is answered by the Bayes theorem in terms of a probability distribution over all possible sky brightness distributions conditional to the data.
Reconstruction algorithms may be judged based on their statistical integrity or by their performance. The first perspective ultimately leads to pure Bayesian algorithms, which are too expensive for typical problems computationally. The latter often leads to ad hoc algorithms that may perform well in applications, but these can have major shortcomings such as a missing uncertainty quantification or negative-flux pixels, which is the case, for example, for CLEAN (Högbom, 1974). The resolve algorithm attempts a compromise between these two objectives. It is based on purely statistical arguments and the necessary operations are approximated such that they can efficiently be implemented on a computer and be used for actual imaging tasks. Thus, the approximations and (prior) assumptions on which resolve is based can be written down explicitly.
resolve is reasonably fast but cannot compete in pure speed with algorithms like the Cotton-Schwab algorithm (Schwab, 1984) as implemented in CASA. This is rooted in the fact that resolve not only provides a single sky brightness distribution but needs to update the sky prior probability distribution according to the raw data in order to properly state how much the data has constrained the probability distribution and how much uncertainty is left in the final result. This uncertainty is defined in a fashion such that it can encode the posterior variance and also cross-correlations. Thus, the uncertainty is qunatified by pieces of information where is the number of pixels in the image. Given this massive amount of degrees of freedom it may be surprising that resolve is able to return its results after a sensible amount of time. Having said this, there is still potential for improvement. The technical cause for the long runtime is the complexity of the gridding/degridding operation, which needs to be called orders of magnitude more often than for conventional algorithms. This problem may be tackled from an information-theoretic perspective in the future.
Turning to the specific subject of the present publication, the data reduction pipeline of modern radio telescopes consists of numerous steps. In this paper, we would like to focus on the calibration and imaging part. Calibration is necessary because the data is corrupted by a variety of effects including antenna-based, baseline-based, and direction-dependent or direction-independent effects (Smirnov, 2011). For the scope of this paper only antenna-based calibration terms are considered, a simplification which is sensible for telescopes with a small field of view such as ALMA or the VLA. The crucial idea of this paper is to view the amplitude and phase corrections for each antenna as one-dimensional fields that are defined over time. These fields are discretized and regularized by a prior which states that the calibration solution for a given antenna is smooth over time. This removes the ambiguity of an interpolation scheme in between the calibrator observations and the subsequent application of self-calibration. Because resolve is an IFT algorithm, there is no notion of solution intervals, which are time bins in which traditional calibration algorithms bin the data (see, e.g., Kenyon et al., 2018). Instead IFT takes care of a consistent discretization of the principally continuous fields. Similarly, the sky brightness distribution is defined on a discretized two-dimensional space; only single-channel imaging is performed in this work.
In practice, the current approach in the IFT community is to define a generative model that turns the degrees of freedom, which are learned by the algorithm into synthetic data that can be compared to the actual data in a squared-norm fashion (in the case of additive Gaussian noise). This approach is similar to the so-called radio interferometric measurement equation (RIME; Hamaker et al. 1996; Perkins et al. 2015; Smirnov 2011). Therefore, our notation closely follows the notation defined in Smirnov (2011). Calibration effects that are part of the RIME but left out for simplicity in this publication could in principle be integrated into the resolve framework.
The resolve approach may be classified according to the notion of first, second, and third generation calibration established in Noordam & Smirnov (2010): it unifies cross-calibration (1GC), self-calibration (2GC), and imaging. Still it is to be strictly distinguished from existing approaches like Kenyon et al. (2018); Cai et al. (2018), and Salvini & Wijnholds (2014). This is because it focuses on a strict Bayesian treatment combined with consistent discretization (one of the main benefits of IFT) and does not use computational speed as an argument to drop Bayesian rigidity.
The actual posterior probability distribution of the joint imaging and calibration problem is highly non-Gaussian and therefore not easily storeable on a computer. In order to overcome this apparent problem the posterior is approximated by a multivariate Gaussian with full covariance matrix. The algorithm prescribes to minimize the Kullback-Leibler divergence (KL divergence) between the actual posterior and the approximate one which is the information gain between the two probability distributions. We use the variant of this known as Metric Gaussian Variational Inference (MGVI) (Knollmüller & Enßlin, 2019).
Together with this publication we contribute an implementation of resolve that is available under the terms of GPLv3.111https://gitlab.mpcdf.mpg.de/ift/resolve It is based on the Python library NIFTy (Arras et al., 2019), which is freely available as well.
The paper is divided into four sections. Section 2 discusses the structure of likelihood and prior for the statistical problem at hand. This defines an algorithm which is verified on synthetic data in Section 3 and afterward applied to real data from the VLA in Section 4. Section 6 finishes the paper with conclusions and a outlook for future work.
2 The algorithm
2.1 Bayes’ theorem
Every reconstruction algorithm needs a prescription of how the quantity of interest affects the data . This prescription is called the data model. Combined with statistical information, this model defines the likelihood which is a probability distribution on data realizations conditioned on a given realization of the signal . Bayes’ theorem,
[TABLE]
requires us to supplement the likelihood with a prior probability distribution , which assigns a probability to each signal realization . This distribution encodes the knowledge the scientist has prior to looking at the data. Since it is virtually impossible to visualize the posterior probability distribution in the high dimensional setting of Bayesian image reconstruction we may compute the posterior mean and posterior variance as
[TABLE]
The notation is borrowed from statistical physics and means integrating over all possible configurations . For a discussion on this measure in the continuum limit see Enßlin (2018, section 1.8). In practice, this integral is discretized as follows: where refers to the pixel values of the discretized quantity . The term is independent of and serves as a normalization factor. It expresses the probability to obtain the data irrespective of what the signal is, i.e. .
In the following we describe the data model and implied likelihood employed by resolve. This includes the assumptions resolve makes about the measurement process. Afterward, resolve’s prior is discussed. For definiteness the notation established in Smirnov (2011) is used.
2.2 Data model and likelihood
The measurement equation of a radio interferometer can be understood as a modified Fourier transform followed by an application of data-corrupting terms, the terms which need to be solved for in the calibration procedure. Assume that the data is only corrupted by so-called antenna-based direction-independent effects. Then Smirnov (2011, equation 18) is written as
[TABLE]
where
- •
: Direction cosines on the sky and .
- •
: Antenna indices where is the total number of antennas of the interferometer.
- •
: Visibility for antenna pair .
- •
: Vector that connects antenna with antenna . The coordinates and are aligned with and , respectively. The value is perpendicular to both and points from the interferometer toward the center of the field of view.
- •
: Antenna-based direction-independent calibration effect.
- •
: Intrinsic sky brightness matrix. Since only the Stokes I component is considered in this publication, this matrix is proportional to the identity matrix.
Equation (4) can be understood as a Fourier transform of the sky brightness distribution, which is distorted by the terms involving and corrupted by the calibration terms . For the purpose of this publication we make the following simplifying assumptions: First, only the total intensity is reconstructed. Second, is assumed to be diagonal, which states that there is no significant polarization leakage and especially no time-variable leakage. Finally, the temporal structure of the data is needed for the construction of the prior. Therefore, a time index is added to the above expression that is written as
[TABLE]
where are diagonal matrices and is a diagonal matrix, which is proportional to unity in polarization space. We note that needs to absorb the -term from (4), which is possible as long as polarization leakage is not too time variable. The -term can be taken care of by -stacking (Offringa et al., 2014), which means that the range of possible values for is binned linearly such that the integral becomes an ordinary Fourier transform. Technically, the non-equidistant Fourier transform in (5) is carried out by the NFFT library (Keiner et al., 2009) in our resolve implementation.
All in all, (5) prescribes how to simulate data given calibration solutions and an inherent sky brightness distribution , which is what we wanted. In order to declutter the notation in the following let us denote the quantities of interest by and the map such that .
The commonly used data model is the following: . It assumes additive Gaussian noise (Thompson et al., 1986). Let be a diagonal noise covariance matrix with the noise variances on its diagonal and refers to a Gaussian random field with mean and covariance matrix . Then, the additive noise can be marginalized over to arrive at an expression for the likelihood
[TABLE]
The likelihood distribution contains all information about the measurement device and the measurement process that the inference algorithm will take into account.
We conclude the discussion on data and likelihood with three remarks: First, the likelihood does not depend on the statistical method at hand. All simplifications being made are rooted in practical reasons in the implementation process. There is no fundamental reason for not taking, for instance, a more accurate noise model or a more sophisticated calibration structure into account.
Second, the employed notation already hints at the goal of describing an algorithm that jointly calibrates and images: the generalized response function takes at the same time the calibration parameters and the intrinsic sky brightness distribution as an argument.
Finally, we consider what happens if the telescope alternates between observing the science target and observing a calibration source. Then, both the data set and the intrinsic sky brightness consists of two parts and the likelihood separates into
[TABLE]
From the likelihood perspective, calibration and science source are two separate things. However, as soon as the one-dimensional calibration fields are supplemented by a prior that imposes temporal smoothness the degrees of freedom regarding the science target and calibration target interact. This solves the problem of applying interpolated calibration solutions in traditional cross-calibration in a natural way.
2.3 Prior
Turning to the prior probability distribution, we note that the technical framework in which resolve is implemented allows for a variety of different priors, which may supersede that presented in this paper.
As stated before are assumed to be diagonal,
[TABLE]
The elements of this matrix are functions defined over time and take the following complex nonzero values:222 are the units of , i.e., .
[TABLE]
The natural way of parameterizing a function taking values in is in polar coordinates, i.e.,
[TABLE]
where and . The modulus and phase of the complex gains have different physical origins. The modulus describes a varying amplification of the signal in the antenna electronics, which is rooted amongst others in fluctuating temperatures of the receiver system. Varying phases stem from fluctuations in the atmosphere. Therefore, these two ingredients of have differing typical timescales a priori.
The prior knowledge on and is the following: , respectively, share a typical behavior over time for all antennas , both of which are not known a priori and need to be inferred from the data as well. This typical behavior does not change over time. Additionally, all evolve smoothly over time. Mathematically, this can be captured by Gaussian random fields,
[TABLE]
where is defined such that the Gaussian random fields obey homogeneous but still specifically unknown statistics. This means that not only the calibration solutions themselves but also their prior correlation structure is inferred. For this a prior on the covariances needs to be supplemented: . In Section 2.4 we descibe how to set up the prior on and such that they implement homogeneous statistics and which parameters they take.
Next, let us discuss the prior on the sky brightness distribution . We recall that the matrix is assumed to be diagonal and proportional to unity, i.e.,
[TABLE]
where map the field of view to the set of positive real numbers since sky brightness is inherently a positive quantity.333We note the difference to Högbom’s CLEAN, which has positivity not built in (Högbom, 1974). For the scope of this publication, the sky brightness contains only a diffuse component. It shall be modeled similarly to the modulus of the calibration terms: it is strictly positive a priori, smooth over its domain and may vary over large scales. Therefore, we define and let be a two-dimensional Gaussian random field with correlation structure which is going to be inferred as well:
[TABLE]
All in all, the basic structure of the priors on all terms appearing in (5) has been described apart from the construction of the prior on all covariance matrices, which is the objective for the next section.
2.4 Correlated fields
To account for correlations of a Gaussian distributed field the following statements are assumed to be true:
The autocorrelation of can be characterized by a power spectrum , where is the coordinate of the Fourier transformed field. 2. 2.
The power spectrum is a positive quantity that can vary over many orders of magnitudes. 3. 3.
Physical power spectra are falling with , typically according to a power law. 4. 4.
Given enough data, it is possible to infer any kind of differentiable power spectrum.
Note that the first assumption is equivalent to the seemingly weaker assumptions:
- •
In absence of data, there is no special direction in space or time, i.e., a priori the correlation of the field is invariant under rotations.
- •
In absence of data, there is no special point in space or time, i.e., a priori the correlation of field values is invariant under shifts in space or time.
The fact that homogeneous and isotropic correlation matrices are diagonal in Fourier space and can be fully characterized by a power spectrum is known as the Wiener-Khinchin theorem (Wiener, 1949; Khintchin, 1934).
It is assumed that as well as its power spectrum are unknown. Therefore, both need a prior that may be formulated as generative model: an operator that generates samples for and its square root power spectrum (henceforth called amplitude spectrum) from one or multiple white Gaussian fields. Formulating a prior as a generative model has several theoretical and practical advantages (Knollmüller & Enßlin, 2018).
We propose the following ansatz for an operator that converts independent normal distributed fields, and to the amplitude spectrum of the correlated field . This operator is called amplitude operator (see Figure 1 for an illustrative example), i.e.
[TABLE]
where denotes the tuple of parameters (all real numbers), denotes the pullback of a field by the exponential function acting on 444Let with open and a smooth function, i.e., a field. Then denotes the pullback of by . In other words, the field is transformed to a different coordinate system whose coordinates are related to the original one by ., Exp denotes exponentiation of the field values, denotes the Fourier transform of a space with coordinates to the logarithmic coordinates of the power spectrum, and are the slope and the -intercept of the a priori mean power law, sym is an (anti-)symmetrizing operation defined to operate on a field over the interval as
[TABLE]
for . In words, sym mirrors the field and subtracts it from itself, then restricts the domain to half the original size. Finally, cp is the log-cepstrum,
[TABLE]
Let us show that (16) meets the requirements stated at the beginning of section 2.4. Requirement 1 is trivial. Requirement 2 is met since the amplitude spectrum is constructed by applying an exponential function to a Gaussian field. Thus, all values are positive and can vary over several order of magnitudes.
To requirement 3: In absence of data, the mean of the inferred white fields and , to which the amplitude operator is applied, remains zero. For and , (16) becomes
[TABLE]
which is the equation for a power law with spectral index . A preference for falling spectra can be encoded by choosing the hyperparameter to be negative.
To requirement 4: Let us show that any differentiable function lies in the image space of the amplitude operator. This implies that any differentiable amplitude spectrum can be inferred given enough data. Let be an arbitrary smooth field over the interval and be a smooth field that has a point symmetry at and is defined on the interval as
[TABLE]
The function is a continuous and differentiable continuation of at . Now, we decompose into a linear part and a residual term:
[TABLE]
where
[TABLE]
The residual term is a differentiable periodic function, i.e.,
[TABLE]
Thus, can be represented in Fourier space by a field that falls of at least with second order. This is exactly how is represented in (16). Assuming that the mean and the slope of the linear part are well represented by its prior distribution, it is indeed possible to represent any kind of differentiable amplitude spectrum. All in all, all four requirements are met by (16).
There remains one unconstrained degree of freedom, the value of the power spectrum at , the zero mode. As the zero mode describes the magnitude of the overall logarithmic flux, it is decoupled from the remaining spectrum and should have its own prior. This value is fixed by imposing an inverse gamma prior on the zero mode, which restricts it to be a positive quantity, while still allowing for large deviations.
To sum up, the amplitude operator depends on the following eight hyper parameters:
- •
, : The amplitude parameter and cutoff scale of the log-cepstrum.
- •
, : The prior means for the slope and the height of the power law.
- •
, : The corresponding standard deviations.
- •
, : The shape and scale parameter of the inverse gamma prior for the zero mode.
We note that the assumptions made at the beginning of section 2.4 apply to a wide variety of processes, regardless of their dimensionality. This generic correlated field model has already been successfully used in a number of synthetic and real applications (Leike & Enßlin, 2019; Knollmüller & Enßlin, 2018, 2019; Knollmüller et al., 2018; Hutschenreuter & Enßlin, 2019). In resolve, the amplitude operator is used as a prior for the amplitude spectra of the antenna calibration fields and the image itself.
2.5 Full algorithm
In the aforegoing sections, the full likelihood and prior are described. Now, we stack all the ingredients together to build the full algorithm. Let us assume that the data set consists out of two alternating observations: observations of a calibrator source and observations of the science target. This means that the likelihood splits into two parts as indicated in (9). In contrast to the sky brightness distribution of the science target that of the calibrator is known: it is a point source in the middle of the field of view. The sky brightness distribution of the science target is reconstructed.
The full likelihood takes the form
[TABLE]
where denote the tuple of parameters of the respective amplitude operator, is a padding operator. The unit matrices in (27) is a matrix acting on the same space as the sky brightness matrix . The tuple of all excitation fields is called , where
[TABLE]
As discussed before this model is set up such that the excitation fields have white Gaussian statistics a priori,
[TABLE]
The posterior probability distribution is given by
[TABLE]
Finally, the statistical model that is employed in this publication is fully defined.
2.6 Inference algorithm
The probability distribution (38) has too many degrees of freedom to be represented on a computer. The resolve algorithm solves this problem by approximating this full posterior distribution by a multivariate Gaussian distribution whose covariance is equated with the inverse Fisher information metric. The latter can be represented symbolically alleviating the need for an explicit storage and handling of otherwise prohibitively large matrices. This algorithm is called MGVI and is described in full length in Knollmüller & Enßlin (2019) and implemented in NIFTy.555https://gitlab.mpcdf.mpg.de/ift/nifty The following is an inexhaustive outline of Knollmüller & Enßlin (2019).
The algorithm MGVI prescribes to minimize the KL divergence666Also known as discrimination information. between the actual posterior and approximate posterior such that
[TABLE]
where is more informed compared to . However, it is apparent that it is virtually impossible to perform the integration with respect to the posterior distribution as integration measure. Therefore, MGVI exchanges the order of the arguments of the KL divergence such that the integral can be approximated by samples of the approximate posterior, i.e.,
[TABLE]
where is the information Hamiltonian and the Fisher information. The parameter is a cost function that can be minimized with respect to . Suitable (2nd order) minimizers are provided by NIFTy.
With the help of the above approximation scheme we gets a computational handle on the posterior. The drawbacks of this approach include the uncertainty estimate of MGVI sets a lower bound on the variance of the posterior and it is not suited for extremely non-Gaussian and especially multimodal probability distributions. But we note that the posterior is approximated with a Gaussian in the space on which the parameters are defined. After processing the parameters through nonlinearities as discussed in this section the actual quantities of interest such as the sky brightness distribution are not Gaussian distributed any more and may even have multiple modes. A detailed discussion on the abilities of MGVI is provided in Knollmüller & Enßlin (2019).
3 Verification on synthetic data
This section is devoted to the verification of the algorithm, i.e., the reconstruction of a synthetic sky brightness distribution from a simulated observation and artificial noise. The setup is described followed by a comparison of the ground truth and the reconstruction. Application to real data, where effects that are not modeled may occur and the ground truth is unknown, is presented in Section 4.
We employ a realistic uv coverage. It is an L-band observation of the source G327.6+14.6, also known as SN1006777 VLA archive project code: AM0754, Jan 24, 2003, L-Band MHz, CnD configuration. . For the purpose of this paper we randomly select 30000 visibilities from this data set to demonstrate that joint calibration and imaging is possible even without much data. (see Figure 2). We use the field shown in Figure 3 as the synthetic sky brightness distribution. It is a random sample assuming the power spectrum shown in orange in Figure 4. The noiseless simulated visibilities are corrupted by noise whose level is visualized in Figure 5. The resulting information source, i.e., the naturally weighted dirty image, is shown in Figure 6.
This synthetic observation is set up in a fashion such that the calibration artifacts are stronger and the noise level is higher as compared to real data (see Section 4) to demonstrate the capability of the resolve in bad data situations. The calibration artifacts that have been applied are visualized in Figure 7.
The resolve algorithm is run on this synthetic data to compare its output and uncertainty estimation to the (known) ground truth. The prior parameters are listed in Table 1. Additionally, we choose a resolution of pixels for the sky brightness distribution with a field of view of and 256 pixels for the calibration fields that are defined on a temporal domain. As the total length of the observation was min one temporal pixel is s long. These temporal pixels should not be confused with solution intervals of traditional calibration schemes where the data is binned on a grid and then the calibration parameters are solved for. In IFT fields are by their nature continuous quantities that are discretized on an arbitrary grid. For convenience a regular grid was chosen. Then the data provides information on each pixel that is propagated to the neighboring pixels through the prior; the calibration fields are assumed to be smooth over time. Therefore, the user is free to choose the resolution of the fields in IFT algorithms as long as it is finer than the finest structure that shall be reconstructed.
As pointed out resolve is a Bayesian algorithm whose output is not a single image of the observed patch of the sky but rather a probability distribution of all possible sky configurations. The MGVI algorithm approximates this non-Gaussian probability distribution with a Gaussian in the space of , i.e., the eigenspace of the prior covariance. This again implies non-Gaussian statistics on quantities such as , , and since they depend in a nonlinear fashion on . The only useful way to visualize this probability distribution is to analyze a finite number of samples from it which resolve can generate. A given set of samples can then be analyzed with standard statistical means such as the pixel-wise mean and variance.
Figures 8, 9, and 10 show the posterior mean, the absolute value of the residual, the standard deviation of the sky brightness distribution, and a histogram of the residual divided by the standard deviation computed from 100 posterior samples, respectively. The algorithm has managed to perform the calibration correctly and to reconstruct the sky brightness distribution. The total flux of the ground truth (Fig. 3) could not totally be recovered because of the noise on the synthetic measurement. Remarkably, the proposed uncertainty is a bit too small compared to the residuals which is what is to be expected from MGVI.
Since resolve does not assume a specific power spectrum as prior for the reconstruction but rather learns it together with the sky brightness from the data, resolve also provides the user uncertainty on the power spectrum; see Figure 4. We note that the posterior variance on the power spectrum increases toward the boundaries of the plot. This is because interferometers do not provide information on scales larger than those that belong to the shortest baseline. On small scales an interferometer is limited by the noise level, which leads to an increased variance in the power spectrum on the right-hand side of the plot.
Next, we turn to the calibration solutions. Figure 7 shows a comparison of the ground truth and the posterior provided by resolve. Since two polarizations are considered (LL and RR) for both the amplitude and the phase of the antenna-based calibration term, Figure 7 has four rows. On first sight, the posterior mean and the ground truth are indistinguishable by eye and the residuals and posterior standard deviation fit together nicely. There is a significant increase of the uncertainty for, for example, antenna 2 toward the end of the observation. This is because a flagged data set was used and that simply all data points involving this antenna have been flagged from the beginning of the observation up to h.
To illustrate this more explicitly, Figures 11 and 12 show the calibration solution for one antenna, respectively. The ground truth lies within the bounds of uncertainty indicated by the samples. We note that all data points have been flagged on the left-hand side of Figure 11. Since no information about the phase is available the only constraint is the prior, which enforces temporal smoothness. Consistently, the uncertainty increases where no information is available.
Finally, we demonstrate what kind of other information posterior samples can reveal. Say, a scientist is interested in the integrated flux over a certain region. In addition to the image, this integrated flux comes with an uncertainty that can be calculated by averaging over posterior samples of the sky brightness distribution. An example is shown in Figure 13. The scatter of the histogram is caused by the noise influence on the data, the (un)certainty of the calibration solutions, and ultimately the uv coverage. We are not aware of any other radio aperture synthesis algorithm that is able to provide this kind of probabilistic posterior information. All in all, the proposed method is able to recover the ground truth and is able to supplement it with an appropriate uncertainty estimation.
4 Application to VLA data
We continue with an application of resolve to real data. To this end, take the VLA data set whose uv coverage and time stamps have already been used in the preceding section. Also, the resolution of all spaces is taken to be the same.
Starting from raw data, the first thing to look at is the information source (see Figure 14). No structure of the supernova remnant is visible whatsoever since the data is not calibrated yet. This illustrates that resolve is able to operate on raw (but already flagged) visibilities that have not been processed further. Table 2 summarizes the prior parameters for the following reconstruction.
All calibration solutions are shown in Figure 15 together with two exemplary plots in Figures 16 and 17. The major characteristic of these solutions are hidden in the right-hand column of Figure 15: the uncertainty on the calibration decreases whenever the calibrator source is observed as expected. Additionally, the uncertainty increases dramatically where the data has been flagged. The amplitude solutions are surprisingly stable over time although the prior would allow for more variance in the solution as can be seen from Section 3, where the same prior parameters have been used.
There is a systematic difference between the samples for the amplitude solutions and those for the phases. The former vary around a mean solution whereas the latter exhibit a certain global offset. This is explained by the fact that the likelihood is invariant under a pixel-wise global phase shift, which is broken by the prior to a global phase shift to all temporal pixels at once. This residual symmetry is again broken by the prior on the zero-mode variance of the phase solutions. However, this prior is very weak to allow for phase solutions of arbitrary magnitude. Therefore, the phase solutions cannot have an arbitrarily large offset but still can globally vary to some degree, which is shown in Figure 17.
Next, the posterior sky brightness is discussed. Figures 18 and 19, along with Figures 20 and 21, show the posterior mean and pixel-wise standard deviation of . The posterior standard deviation is higher wherever more flux is detected. Therefore, Figure 21 provides a descriptive visualization of the posterior uncertainty of the sky brightness distribution.
Last but not least the power spectrum of the logarithmic sky brightness distribution also needs to be reconstructed; this is shown in Figure 22. The power spectrum is more constrained compared to that of section 3 since the noise level is much lower in this data set as compared to the synthetic data set. We might expect the posterior power spectrum to feature nodes or distinct minima because the Fourier transform of compact objects typically exhibit such. This is suppressed by the smoothness prior on the power spectrum. However, we note that this does not mean that the algorithm cannot reconstruct the object because it can still choose to not excite the respective modes in .
All in all, this demonstrates that resolve is not only able to operate on synthetic data but is actually capable of solving for the sky brightness distribution and the calibration terms at the same time for real data sets.
5 Performance and scalability
Performance and scalability are crucial aspects of the applicability of algorithms. The expensive part of the evaluation of the sky model is a fast Fourier transform (FFT), which is in where is the total number of pixels of the sky model. For real-world data sets the cost for the (de)gridding exceeds the FFTs by far such that one likelihood evaluation is in , where is the number of data points that need to be degridded once for each polarization. To compute the sampled KL divergence we need to compute the likelihood times, where is the number of samples (typically 3 – 20). The memory consumption scales linearly with the number of samples used to approximate the KL divergence, number of pixels, and number of data points. This is possible since NIFTy is designed such that no explicit matrices need to be stored.
Both reconstructions in this paper each took minutes to be computed on a mobile CPU (Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz) with 4GB main memory. The response and adjoint needed to be called times, respectively.
These values might improve in the future. Barnett et al. (2018) have proposed a novel gridding kernel that features speedups of several times in first experiments. This is possible since it needs relatively small support and can be computed on the fly. Also, the structure of the algorithm allows for various forms of parallelization. The gridding/degridding can be computed in parallel with OpenMP. Moreover, the data set could be split into several parts and distributed on a cluster. This is a general feature of Bayesian statistics: a likelihood can be split into the product of two likelihoods each of which contains only a subset of the data. Additionally, the evaluation of the KL divergence, which is a sum of few but expensive independent summands, can be distributed. Finally, NIFTy offers the (experimental) feature to distribute large fields on a cluster. Orthogonal to computational speedup ideas the algorithm might also benefit from compressing the likelihood itself such that fewer (de)gridder calls are necessary.
6 Conclusions
We have presented the probabilistic resolve algorithm for simultaneous calibration and imaging. After a derivation from first principles of the full posterior probability distribution for the joint calibration and imaging algorithm resolve, it has been shown how this distribution can be approximated by a multivariate Gaussian probability distribution to render the problem computationally solvable. This method is called MGVI and provides a prescription for how to draw samples from the approximate posterior distribution. The calibration algorithm resolve has been verified on synthetic data. The results indicate that the uncertainty quantification is qualitatively sensible but should be taken with a grain of salt since MGVI systematically underestimates posterior variance. Furthermore, it has been demonstrated that the algorithm has the capability to reconstruct a sky brightness distribution of a intricate source, the supernova remnant SN1006, together with uncertainty information from raw VLA L-band data.
There are many open ends to continue the investigation that we started with this paper. First, the model for the sky brightness distribution may include point source and multifrequency correlations. On top of that the response may be described more thoroughly. Direction-dependent calibration and nontrivial primary beam effects may be taken into account. Moreover, we performed the flagging by a standard CASA flagging algorithm. This can be replaced with an algorithm rooted in information theory that unifies flagging with calibration/imaging. Additionally, a major/minor cycle scheme similar to that in CLEAN may be introduced to avoid to frequent (de)gridding operations. This is necessary to apply resolve to big data sets from telescopes such as MeerKAT. Finally, resolve can be extended to polarization imaging. On an orthogonal track resolve may be used for imaging of a variety of sources from different telescopes including ALMA and especially the Event Horizon Telescope.
Acknowledgements.
We would like to thank Jamie Farnes for his workshop on calibration with CASA at the Power of Faraday Tomography 2018 conference and his script for calibrating the data set at hand with CASA, which enabled development of the IFT calibration algorithm. We thank Landman Bester, Ben Hugo, Jakob Knollmüller, Julian Rüstig, Oleg Smirnov, and Margret Westerkamp for numerous discussions, explanations, and feedback, and Martin Reinecke for his work on NIFTy, which was the crucial technical premise of the project. We acknowledge financial support by the German Federal Ministry of Education and Research (BMBF) under grant 05A17PB1 (Verbundprojekt D-MeerKAT). We thank the anonymous referee for insightful comments and the language editor for substantially improving the text quality.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Arras et al. (2019) Arras, P., Baltac, M., Ensslin, T. A., et al. 2019, Astrophysics Source Code Library
- 2Barnett et al. (2018) Barnett, A. H., Magland, J. F., & Klinteberg, L. a. 2018, ar Xiv preprint ar Xiv:1808.06736
- 3Cai et al. (2018) Cai, X., Pereyra, M., & Mc Ewen, J. D. 2018, Monthly Notices of the Royal Astronomical Society, 480, 4154
- 4Enßlin (2018) Enßlin, T. A. 2018, Annalen der Physik, 1800127
- 5Enßlin et al. (2009) Enßlin, T. A., Frommert, M., & Kitaura, F. S. 2009, Physical Review D, 80, 105005
- 6Hamaker et al. (1996) Hamaker, J., Bregman, J., & Sault, R. 1996, Astronomy and Astrophysics Supplement Series, 117, 137
- 7Högbom (1974) Högbom, J. 1974, Astronomy and Astrophysics Supplement Series, 15, 417
- 8Hutschenreuter & Enßlin (2019) Hutschenreuter, S. & Enßlin, T. A. 2019, ar Xiv e-prints, ar Xiv:1903.06735
