Consistent Inversion of Noisy Non-Abelian X-Ray Transforms
Fran\c{c}ois Monard, Richard Nickl, Gabriel P. Paternain

TL;DR
This paper introduces a Bayesian approach using Gaussian processes to invert noisy non-Abelian X-ray transforms on simple surfaces, achieving convergence rates and stability estimates for recovering matrix fields.
Contribution
It develops a novel statistical algorithm for the inverse problem of non-Abelian X-ray transforms, with proven convergence rates and a new stability estimate.
Findings
Convergence rate of the statistical error is algebraic in 1/N.
Error approaches 1/√N for smooth matrix fields.
Stability estimate for the inverse map is established.
Abstract
For a simple surface, the non-linear statistical inverse problem of recovering a matrix field from discrete, noisy measurements of the -valued scattering data of a solution of a matrix ODE is considered (). Injectivity of the map was established by [Paternain, Salo, Uhlmann; Geom.Funct.Anal. 2012]. A statistical algorithm for the solution of this inverse problem based on Gaussian process priors is proposed, and it is shown how it can be implemented by infinite-dimensional MCMC methods. It is further shown that as the number of measurements of point-evaluations of increases, the statistical error in the recovery of converges to zero in -distance at a rate that is algebraic in , and approaches for smooth matrix fields . The proof relies, among other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\startingpage
1 \authorheadlineF. Monard, R. Nickl, G.P. Paternain \titleheadlineConsistent Inversion of Noisy Non-Abelian X-Ray Transforms
Department of Mathematics, University of California, Santa Cruz, CA 95064 Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge CB3 0WB, UK Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge CB3 0WB, UK
Consistent Inversion of Noisy Non-Abelian X-Ray Transforms
François Monard
Richard Nickl
Gabriel P. Paternain
(Month 200X)
Abstract
For a simple surface, the non-linear statistical inverse problem of recovering a matrix field from discrete, noisy measurements of the -valued scattering data of a solution of a matrix ODE is considered (). Injectivity of the map was established by [Paternain, Salo, Uhlmann; Geom. Funct. Anal. 2012, [35]].
A statistical algorithm for the solution of this inverse problem based on Gaussian process priors is proposed, and it is shown how it can be implemented by infinite-dimensional MCMC methods. It is further shown that as the number of measurements of point-evaluations of increases, the statistical error in the recovery of converges to zero in -distance at a rate that is algebraic in , and approaches for smooth matrix fields . The proof relies, among other things, on a new stability estimate for the inverse map .
Key applications of our results are discussed in the case to polarimetric neutron tomography, see [Desai et al., Nature Sc. Rep. 2018, [12]] and [Hilger et al., Nature Comm. 2018, [23]].
††volume: 000
Contents
1 Introduction
1.1 Non-Abelian -ray transforms
Our object of study is the non-abelian -ray transform, a mapping from a matrix-valued field defined on a Riemannian surface with boundary , to its scattering data , defined at the influx boundary of , given by
[TABLE]
where is the tangent bundle of , and denotes the outward unit normal at .
We will assume that the surface is simple in the sense that it is (topologically) a disk, it has no conjugate points, and a strictly convex boundary. Strictly convex domains in the plane (and small perturbations of them) are examples of simple surfaces. In this context, all unit-speed geodesics111Unit-speed geodesics are locally defined dynamically through the equation with the Levi-Civita connection, and satisfying for all where is defined. in exit in finite time. This fact allows us to identify with the space of geodesics on , by associating to any the unique geodesic passing through .
Let be a smooth map. Given a unit-speed geodesic with endpoints , we may define the scattering data of on to be , where satisfies the linear system of ODE’s
[TABLE]
This problem, backward in time for convention here, is well-posed and leads to a unique definition of , containing cumulated information about along the geodesic . Note that when is scalar, we obtain , which is the classical X-ray/Radon transform of along the curve . Considering the collection of all such data makes up the scattering data (or non-Abelian X-ray transform) of , viewed here as a map
[TABLE]
and we are concerned with the problem of recovering from . Inverting Abelian and non-Abelian X-ray transforms are examples of inverse problems in integral geometry, an active field permeating several tomographic imaging methods, see e.g. the recent topical review [24].
The problem of inverting the non-linear mapping in this generality has been recently solved in [36]. Previous injectivity results were obtained, either by adding curvature conditions on the manifold, or by fixing a Lie group (realised as matrices, for simplicity) and its Lie algebra , in turn asking whether a -valued field can be recovered from its -valued scattering data . In this paper, we will mainly use the Lie groups , and , and their Lie algebras , and . Above, ’’, ’’, ’’ and ’tr’ refer to matrix ’transpose’, ’conjugate transpose’, ’determinant’ and ’trace’, respectively. Note the inclusions
[TABLE]
The state of the art on this question can be written as follows:
Theorem 1.1**.**
Let be a simple surface. The map is injective in the following cases:
(a) [35];
(b) [36].
The proof of (b) consists of a reduction to the unitary case in (a) via a factorization theorem in Loop Groups. Earlier injectivity results have been obtained by several authors, cf. [15, 33, 34] and references therein, particularly when is a domain in the Euclidean plane.
The absence of concrete reconstruction formulas for the inverse map when , and the challenge of dealing with physical experiments such as those arising in polarimetric neutron tomography (see Section 1.2), where discrete and noisy measurements of are made (see Section 1.3 for details), motivate the main contribution of this article, which is to present a statistical algorithm that allows to recover . The implementation of is detailed in Section 4, and our main theoretical result is the statistical analogue of the injectivity result Theorem 1.1, namely the frequentist consistency of reconstruction in the large sample limit, which somewhat informally can be stated as follows:
Theorem 1.2**.**
Suppose the data is generated from the probability distribution where is any smooth matrix field . Then we have that, as sample size , and in -probability,
[TABLE]
See Theorem 3.2 in Section 3 for a fully rigorous statement of this result, which in fact requires significantly weaker hypotheses on , and also specifies an explicit ‘algebraic’ rate of convergence in the last limit.
The proof of the previous theorem relies on ideas from Bayesian nonparametric statistics [44, 17] and on new ‘quantitative versions’ of the injectivity result in Theorem 1.1 which are of independent interest and stated in Section 2.
1.2 Polarimetric neutron tomography (PNT)
The basic problem in PNT consists in finding a magnetic field from spin measurements of neutrons [26, 11, 12, 23]. In this case the explicit relation is
[TABLE]
where is the magnetic field. In the case of PNT one assumes that the underlying surface is just the disc in the plane (by slicing with 2D discs one can solve the 3D problem).
The details of the experiment of polarimetric neutron tomography may be found, e.g., in [12]. Here we give a description that is suitable for our purposes. The data produced by the experiment is the orthogonal matrix , where is the scattering data described above. The significance of this in terms of spin, is a follows: if a neutron travelling along the ray determined by enters the magnetic field with a spin ( denotes the Euclidean unit sphere in ), it exits the field with spin (for an ensemble of polarized neutrons in a magnetic field it can be shown that they behave like a particle with a classical magnetic moment). The magnetic field is defined in 3D space, but the experiment makes measurements on a 2D plane and produces a global reconstruction by slicing. The geometry of the experiment is thus a 2D parallel beam geometry which is easily converted into fan-beam geometry as considered above. The question is then how to manipulate the spin to produce the orthogonal matrix. This is done with an ingenious sequence of spin flippers and rotators placed before and after the magnetic field being measured. The material containing the magnetic field can also be rotated so as to produce parallel beams from different angles. After the spin has been manipulated it goes through an analyser; this device is essentially a spin filter that only lets those neutrons with vertically aligned spin go through. The neutron count is then measured with a detector that produces an intensity reading. The spin of the entering beam is perfectly aligned with the spin of the analyser, so that the intensity measurement is actually a measurement of the angle of rotation of the spin due to the magnetic field. The key relation is given by [26, Equation 1]
[TABLE]
where is the attenuation of the medium, is the intensity of the incoming beam and is the angle by which the spin has rotated.
The use of the spin flipper allows the measurement of
[TABLE]
and from this one deduces that
[TABLE]
which then becomes an entry of our matrix . By rotating by and flipping (rotation by ) one can thus produce the entire orthogonal matrix as data. In other words, if is the canonical basis of 3-space, gives for all and hence all the entries. In some situations, where the attenuation of the medium is known, the use of spin flippers is not necessary and can be calibrated out. Assuming an additive Gaussian noise in the intensities , equation (2) approximately produces an additive Gaussian noise in the entries of the matrix which is precisely the noise model we adopt below.
As in the articles [12, 13] our approach reconstructs 3D magnetic fields of arbitrary direction and distribution. This provides a method able to investigate samples without imposing any a priori knowledge of the magnetic field orientation, and requires understanding of the full non-linear inverse problem. The recent preprint [13] introduces a modified Newton-Kantorovich type algorithm for the solution of the non-linear problem, a Newton-type algorithm where the inversion of the Jacobian at each iteration only uses the differential of the map at the base point .
As pointed out in [13], the algorithm appears to work well for small enough fields (or large enough velocities of neutrons), but may fail due to “phase wrapping” when the field is large enough. Our approach does not exhibit this problem.
1.3 The statistical observation scheme
Consider a simple surface as above with influx boundary , and a matrix valued map
[TABLE]
and scattering data
[TABLE]
Here we take for some , with corresponding Lie algebra , the set of skew-symmetric matrices. Recall that in the key application to PNT from the previous subsection, is the flat disk and . We could take and just as well, but for sake of conciseness prefer to avoid a complex-valued statistical noise model in what follows.
To describe the statistical observation setting, let be the uniform distribution (volume element) on (see (5) below for a precise definition), and consider ‘design’ random variables
[TABLE]
These draws represent a randomised choice of the geodesics for which experiments are performed – they have to be ‘equally spaced’ throughout ‘geodesic space’ in a statistical sense. For each resulting measurement of the statistical observational error arising in the experiment is modelled by independent Gaussian matrix noise. More precisely let
[TABLE]
random variables that are independent of the ’s, and let be the random noise matrix which adds a Gaussian noise variable in each matrix entry to . Our observations then consist of the sequence of random matrices
[TABLE]
The variables are all independent, and even i.i.d. for fixed. Conditionally on they are multivariate normal random variables with diagonal covariance and (vectorised) mean . Note that while takes values in , the are not in (or even ) as we have not constrained at all – this is in line with the physical experiments for PNT described in Section 1.2 where statistical errors arise from noisy measurements of each matrix entry of . For the theory we will assume that the noise variance is fixed and known – in practice it can be replaced by the estimated sample variance of the ’s.
To fix notation: The joint law of the random variables in (3) on will be denoted by , where we note for all . We also write for the law of the ’s, for the law of the and
[TABLE]
for the full data vector. The corresponding expectation operators are obtained by replacing ‘’ by ‘’ in the preceding expressions. The dependence on will be suppressed in the notation.
1.4 Some geometric background and basic notation
We conclude this section by introducing some more basic notation that will be used throughout.
Our background geometry is a simple surface with boundary . By ’simple’, we mean (i) is non-trapping (in the sense that every maximal geodesic in has finite length), (ii) has no conjugate points and (iii) is strictly convex (i.e. has positive definite second fundamental form). We denote by the unit tangent bundle of , namely
[TABLE]
Its boundary can be split into ’influx’ and ’outflux’ boundary, depending on whether the tangent vector points inside or outside, namely we define, for is the outer unit normal at ,
[TABLE]
The manifolds , , and all carry natural volume elements, allowing us to define spaces below. Specifically, the Riemannian metric induces an area form on and restricts to a metric on . The unit sphere bundle carries the volume element where is the length element in the unit circle . Finally the boundary of carries the area form where is as above and is the arclength (w.r.t. the metric ) along the boundary. Its restriction to will be denoted by
[TABLE]
The spaces and will be equipped with the canonical Hermitian inner product and induced norm . For elements in , this corresponds to the Frobenius norm , which is -invariant in the sense that for any and arbitrary, .
Given a -dimensional Riemannian manifold (either , , , , or as explained above), one may adapt the usual function spaces to - or -valued functions as follows: , with norms
[TABLE]
One may differentiate functions using partial derivatives in coordinate charts, or equivalently, using a global basis of smooth vector fields on which pairwise commutes (it will be useful to adopt the latter viewpoint in later sections). Given a -index , one may define and . The metric equips with a distance function , and for , we can thus define Hölder spaces with norm
[TABLE]
with the second term removed when is an integer. We will also use -based Sobolev spaces with norm
[TABLE]
for , and defined by interpolation otherwise (see, e.g., [42, Ch. 4]).
As above, when clear from the context, the domain and/or codomain will be dropped from the notation. In the following sections, spaces of functions with codomain , or their Lie algebras will make use of the same topology of the corresponding spaces of -valued functions. The -subscript attached to a space of maps defined on denotes the linear subspace of those maps that vanish identically outside of a compact subset of the interior of .
2 Theoretical results for the deterministic inverse problem
When discrete measurements of the forward data are corrupted by statistical noise, the injectivity result Theorem 1.1 is not useful to reconstruct from the observations, and we will discuss in the next section how to develop statistical methods that consistently solve this statistical inverse problem. The proofs that substantiate these methods are based on quantitative versions of Theorem 1.1 – stability estimates – as well as continuity properties of the forward map, and we describe in this section the analytical results we obtain.
The results to follow hold when the codomain of the matrix fields is the largest of the three compact Lie groups introduced before Theorem 1.1, namely (with Lie algebra ), see Eq. (1).
Theorem 2.1**.**
Let be a simple surface. Given two matrix fields and in there exists a constant such that
[TABLE]
where is a continuous function of , explicitly
[TABLE]
and where the constants only depend on .
The proof of Theorem 2.1 initially follows the approach for obtaining stability estimates for the geodesic X-ray transform as presented in [40, Theorem 3.4.3]. Our starting point is the pseudo-linearisation formula
[TABLE]
where is a geodesic X-ray transform with suitable weights, see Lemma 5.5. To prove Theorem 2.1 it suffices to show that
[TABLE]
To this end, we use the energy identity (Pestov Identity) developed in [35] for matrix weights arising for connections and matrix fields. The presence of the weights produces additional terms in the identity that need to be controlled to obain the estimate above and this is where most of the work lies. The main idea for controlling them comes from [35] where a connection with the right curvature is artificially introduced to control these terms. The connection is later removed by using (scalar) holomorphic integrating factors whose existence is guaranteed by the microlocal properties of the normal operator associated to the geodesic X-ray transform acting on functions. Taming these integrating factors has a cost which is reflected in the constant given in (6).
For the proof of Theorem 3.2 below we also require ‘forward’ estimates in Sobolev and Hölder scales. These are less sophisticated in nature than the stability estimate above, and hold under less restrictive assumptions. Recall that is said to be non-trapping if there is no geodesic with infinite length (any simple manifold is non-trapping).
Theorem 2.2**.**
Let be a non-trapping surface with strictly convex boundary. For any integer and for every , the following continuity estimates hold:
[TABLE]
where by we mean that the inequality holds with some constant that only depends on , and .
In fact in the proof of Theorem 3.2 we shall use instead of Theorem 2.1 the following corollary of the previous two results:
Corollary 2.3**.**
Under the same hypotheses as in Theorem 2.1 and as in (6), then
[TABLE]
where is independent of or .
3 Bayesian inversion of non-Abelian -ray transforms
3.1 Main results
The main goal of this section is to introduce a method to infer the matrix field from discrete observations of the scattering data described in Section 1.3. We follow the general paradigm of Bayesian inverse problems advocated by A. Stuart [41, 10] which is also related to the paradigm of Bayesian numerical analysis [14, 3] in the noiseless case (). The idea is to start from a Gaussian process prior for the parameter and to use Bayes’ theorem to infer the best posterior guess for given data .
We will state a theorem that shows that the posterior mean fields corresponding to a flexible class of Lie-algebra valued Gaussian process priors for consistently recover the ‘true’ in the frequentist large sample limit as , when noisy experiments have been performed under in the model (3). In fact we will provide a stochastic convergence rate to zero of the recovery error that is algebraic in inverse sample size .
The proof of Theorem 3.2 below provides a template to establish rigorous statistical guarantees for the Bayesian approach to other non-linear inverse problems as well. See Section 5.4 and Remark 3.6 for more discussion.
We emphasise that obtaining probabilistic consistency under entails approximate uniformity of the design and rules out ‘adversarial’ designs. Fixed (non-random) design that is sufficiently ‘equally spaced’ throughout could be considered as well in the theory that follows, either via appealing to asymptotic statistical equivalence results in nonparametric regression [37] or by tracking the numerical discretisation error explicitly through all the proofs that follow. For the purposes of the present paper we opt for the random design setting as it allows for a cleaner, unified probabilistic treatment of the measurement process.
To introduce the Bayesian approach more concisely, consider a prior for a vector field by prescribing a Borel probability measure on the space where
[TABLE]
The natural isomorphism between and the space of continuous functions from to in turn generates a prior for by forming a -valued field from the ’s. For instance in the case so that also , relevant in PNT, we construct from
[TABLE]
Then we make the Bayesian model assumption that
[TABLE]
which by Bayes’ rule generates a conditional posterior distribution of on – it will be denoted by . The posterior distribution arises from a dominated family of probability measures (see (58) below) and is hence given by
[TABLE]
for any Borel set in . Here
[TABLE]
is, up to additive constants, the log-likelihood function of the observations.
While what precedes was not specific to the choice of a particular prior, the main theorem to follow will hold for priors arising from certain -valued Gaussian processes. These will be constructed from a Gaussian base prior from which the coordinates of will be drawn independently. In fact we will require draws from to have -Hölder continuous sample paths on almost surely. We refer, e.g., to [19, Sections 2.1 and 2.6] for the basic definitions of Gaussian measures and processes and their reproducing kernel Hilbert spaces (RKHS).
Condition 3.1*.*
For and , let be a centred Gaussian Borel probability measure on the Banach space that is supported in a separable (measurable) linear subspace of , and assume its RKHS is continuously imbedded into the Sobolev space .
See Remark 3.4 for concrete examples and constructions of such Gaussian process priors with ‘maximal choice’ and arbitrary .
Now given a random draw we define a new random function
[TABLE]
and denote its law in by . Then let be random functions on drawn as i.i.d. copies from , and let the prior for be the resulting centred Gaussian product probability measure in the space (see (10) for ). Shrinking the prior towards the origin in a -dependent way as in (13) is crucial in our proofs, see Remark 3.5 for discussion.
The following theorem gives a bound for the convergence rate of the posterior mean
[TABLE]
towards the true field in -loss, under the law of the observations. Note that this mean (expected value) is understood in the usual sense of Bochner integrals and hence takes values in – for fixed data vector and since for the norms are bounded by a fixed constant, this expected value exists almost surely by (11) and a basic application of Fernique’s theorem (see [19, Exercise 2.1.5]). Let us say if all matrix entries of are contained in .
Theorem 3.2**.**
Suppose the Gaussian prior for arises as after (13) with base prior satisfying Condition 3.1 for . Let be the mean (14) of the posterior distribution arising from observations (3). Assume . Then we have, for some
[TABLE]
The proof is given in Section 5.4. We note that the constraint (and hence ) could be relaxed to (and hence ) at the expense of more technical proofs (see Remark 5.20). We further remark that in the proof we establish in particular that the random posterior measure on concentrates with probability approaching one in a -diameter -ball centred at , see Theorem 5.19.
3.2 Remarks and discussion
Remark 3.3*.*
[The exponent .] In the proof (see (81)) we show that
[TABLE]
is permitted in the previous theorem. If and if we take priors which verify Condition 3.1 for large enough and (possible by Remark 3.4), then we can make as close to as desired, and it is easy to show that cannot be improved upon by any algorithm. So at least for smooth the recovery guarantee from Theorem 3.2 is (near-) optimal. In the ‘low regularity case’ where is not large, our bound for may not be optimal. A conjecture for the optimal value for can be obtained from the much simpler linear and Abelian case () corresponding to the classical Radon transform, which is treated in [32, Example 2.5], where the exponent is attained, which can be shown to be optimal in this special case.
Remark 3.4*.*
[Construction of Gaussian priors.] We describe here some Gaussian process priors verifying Condition 3.1 with .
As a first basic example consider the case where equals the unit disk in with ‘flat’ (Euclidean) geometry, relevant in PNT. For arbitrary we can then take for the restriction to of a stationary Gaussian process on with appropriate (Whittle-) Matérn covariance function (see [17, p.313] and Section 4 below). This gives a Gaussian prior on with RKHS equal to the space of restrictions to of elements of (using Ex.2.6.5 in [19]). This space is well known (e.g., [42], Ch.4) to co-incide with , and the sample paths of this process lie in the separable subspace of for any , see [17, p.575f] for a proof.
The preceding construction works for any smooth bounded domain in . In particular a simple surface is diffeo-morphic to a disc and the Sobolev spaces and co-incide with equivalent norms – the Matérn prior can thus be used even when equals equipped with a different Riemannian metric. Alternatively one can embed isometrically into a larger closed compact (boundary-less) manifold and use the orthonormal basis of eigenfunctions of the Laplace-Beltrami operator on to generate Gaussian random series , which after restriction to and for suitable choice of , generate Gaussian priors with any prescribed Sobolev space as RKHS.
Remark 3.5*.*
[Rescaled Gaussian Priors.] While the use of Gaussian process techniques [4, 16, 27] in the proof of Theorem 3.2 is inspired by previous work in [44, 43] and also [18] for ‘direct’ problems, the inverse setting poses several challenges, particularly in the non-linear case. In our proofs we show how these challenges can be overcome by shrinking common Gaussian process priors towards the origin as in (13) – the shrinkage enforces the necessary additional ‘a-priori’ regularisation of the posterior distribution to permit the use of our stability estimates. While similar re-scaled priors have been shown to work in some ‘direct’ settings before (they appear as special cases of the rescaled priors studied in [43], see their Theorem 3.2), in our setting they play a crucial role: Without re-scaling the exponential growth in the -norms of of the constant (6) would render our stability estimate useless in the proofs.
Remark 3.6*.*
[Related literature on Bayesian non-linear inverse problems.] The study of statistical guarantees for the Bayesian approach to non-linear inverse problems has seen a recent surge of interest. In the references [45, 31, 30] non-linear inverse problems of elliptic and parabolic type are studied. The results therein however only hold for specific ‘uniformly bounded wavelet’ type priors – while these are useful to develop a first theoretical understanding of Bayesian inversion algorithms, they posit very strong a priori assumptions on the parameter of interest and the efficient computability of the resulting posterior distribution is also unclear.
The recent reference [32] obtains convergence rate results for optimisation based MAP-estimates (see Section 4.2 for a brief discussion of those) in a general class of non-linear inverse problems. For non-linear forward maps as the ones relevant here, these MAP-estimates can be difficult to compute, and at any rate may behave quite differently from the posterior mean: The algorithm studied here is a Bochner integral with respect to an infinite-dimensional and non-Gaussian posterior distribution and variational ideas from optimisation cannot be used directly in its analysis. In the proof of Theorem 3.2 we develop new techniques that allow to prove convergence rates for such algorithms – see Section 5.4 for a discussion of the key ideas which are relevant in other settings, too. Indeed, the very recent references [1, 20] have already succeeded in adapting our proof template to other nonlinear inverse problems. For instance [1] study statistical versions of a conceptually related boundary value problem arising with electrical impedance tomography (‘Calderón problems’). Our results imply that statistical inversion of non-Abelian -ray transforms (for ‘smooth parameters’ ) admits better (i.e., polynomial) convergence rates than the necessarily logarithmic (in inverse noise level) recovery guarantees derived in [1] for the Caldéron problem (with smooth conductivities).
Remark 3.7*.*
[Towards Uncertainty Quantification.] Theorem 3.2 also serves as a starting point to prove more refined Bernstein-von Mises theorems that entail that the posterior distribution is approximated in a suitable infinite-dimensional space by a canonical Gaussian measure (cf. [5, 6]). For a non-linear elliptic inverse problem a first result of this kind was recently proved in [30], and for the linearisation of the non-linear problem considered here, such results were obtained in [29]. In principle, joining the ideas of [30, 29] with the techniques of the present paper, one can conjecture that Bernstein-von Mises theorems should also hold true for the case of non-Abelian -ray transforms – this is the subject of ongoing research.
4 Implementation of the algorithm
In this section, we present some numerical reconstructions of an -valued matrix field from its noisy scattering data . In this case, is generated by three real-valued components , through the relation , where we have defined for basis of
[TABLE]
with structure equations , and . The approach presented easily adapts to any -, - or -valued field (including the -valued case of polarimetric neutron tomography, a close cousin of the present case), with some minor Lie group specific modifications to be made for an accurate computation of forward data.
4.1 Numerical domain and forward operator
The computational domain is an unstructured triangular mesh discretising the unit disk made of vertices, and functions on it are piecewise linear, uniquely determined by their values at the vertices. In particular is regarded as an element of .
The metric is isotropic, written as , with scalar function given by
[TABLE]
Such an example can be seen to be non-trapping, have no conjugate points and a strictly convex boundary, i.e. is simple. The case of Euclidean geometry would correspond to . Geodesic (data) space, modelled as is parameterised in fan-beam coordinates (with uniform probability measure ).
Below we will draw geodesics uniformly at random, characterised by initial conditions , , and our statistical algorithm will require numerical evaluation of the forward data which we now describe: Out of each data point , we first compute a geodesic using a forward scheme with stepsize to solve a discretisation of the system
[TABLE]
with initial condition , and , until the geodesic exits the domain. This produces a discretised geodesic
[TABLE]
Once such a geodesic is computed, we must then discretise the matrix ODE
[TABLE]
(The problem here is forward in time unlike that given in the introduction, though since is -valued, this amounts to computing the conjugate transpose of , which leads to the same problem.)
To discretise the above ODE, we denote and implement the scheme
[TABLE]
where we have defined . In fact the code implements a predictor-corrector variant of this scheme for improved accuracy on the computation of the exponentials.
The use of matrix exponentials in (15) (compared to standard forward-marching schemes) ensures that the matrix solution numerically remains in , and the computation of these exponentials can be done via an explicit formula, namely: for and denoting , we have for
[TABLE]
(Note that the formula above would need to be adapted if a Lie algebra different from is of interest.) The evaluation of is done by barycentric combination of the values of at the three vertices of the triangle containing .
After implementing (15), the scattering data is nothing but (in fact, the other values for are not kept in memory after computation). The magnetic field we will use in the experiments below as well as its noiseless scattering data are visualised Fig. 2.
As we will use Monte-Carlo Markov Chains (MCMC) in the following section, let us mention that once the mesh is fixed, some computations are done prior to the MCMC, namely, all geodesics as well as the triangle indices and barycentric weights along them.
4.2 Statistical estimation through MCMC
Given data as in (3), a common approach to inverse problems would be to compute a Tikhonov regulariser which minimises a penalised least squares fit functional (with, e.g., Sobolev-norm penalty)
[TABLE]
over the space of all matrix fields where is the Lie algebra describing the constraint on the co-domain of . The map is not convex, and efficient computation of the global minimiser may be challenging. One approach would be to use a gradient based iterative scheme [25] but the algorithmic stability of these (or other variational) methods is unclear in the setting considered here.
The optimiser of the functional (16) can be shown to correspond to a posterior mode, or ‘maximum a posteriori estimate (MAP)’, of a Gaussian process prior on with RKHS equal to (see [9] for a general result of this kind). Instead of computing that maximiser, one may compute other posterior characteristics such as the posterior mean (average) , which in our non-linear setting is different from the MAP estimate.
For Gaussian priors, MCMC algorithms such as the preconditioned Crank-Nicolson (pCN) method (see [7]) are available to sample from the posterior distribution. To introduce the algorithm, note that as in (12), the log-likelihood function given the data equals, up to additive constants,
[TABLE]
One then approximates the posterior mean by a Monte Carlo average of a Markov chain of length as follows:
Let be a Gaussian prior for ; initialise for , then repeat:
Draw and for define the proposal . 2. 2.
Set
[TABLE]
The algorithm is terminated at and requires evaluation of and thus of the scattering data for every and . For relevant in the simulations that follow, this can be done as described in Section 4.1.
The invariant measure of the Markov chain equals the posterior distribution , and under certain conditions that are compatible with our setting, [22] derived dimension-free spectral gaps which imply that the distribution of mixes rapidly towards . The approximation of by can thus be expected to compare to the one of the standard central limit theorem, with corresponding non-asymptotic error guarantees, see Section 4 in [22].
To perform numerical simulations, we discretise as in Section 4.1 and for each choose an independent Matérn prior (cf. Remark 3.4) with parameters , which on functions on the mesh (i.e., vectors in ) uses the covariance matrix for , with positive definite kernel
[TABLE]
with the modified Bessel function of the second kind. The constant controls the Sobolev regularity while controls the characteristic lengthscale of the samples.
We draw geodesics at random according to the uniform law for (some samples on of size are visualised Fig. 4), and then generate synthetic data as explained in Section 4.1 for the magnetic field displayed in Fig. 2, adding Gaussian noise to each matrix entry of .
We then implement the pCN algorithm to approximately compute the posterior mean from Theorem 3.2. The stepsize is adjusted so that after ‘burn-in’, the acceptance rate of proposals stabilises around . Once the chain is computed we visualise – examples of outcomes corresponding to increasing data set are given in Fig. 5, illustrating the improvement in ‘reconstructions’ as the number of measurement points increases.
5 Proofs
5.1 Geometric preliminaries
Let be a compact oriented two dimensional Riemannian manifold with smooth boundary . As before will denote the unit circle bundle which is a compact 3-manifold with boundary given by
[TABLE]
We let be the geodesic vector field, i.e. the infinitesimal generator of the geodesic flow of . Since is assumed oriented there is a circle action on the fibers of with infinitesimal generator called the vertical vector field. It is possible to complete the pair to a global frame of by considering the vector field . There are two additional structure equations given by and where is the Gaussian curvature of the surface. Using this frame we can define a Riemannian metric on by declaring to be an orthonormal basis and the volume form of this metric will be denoted by . The fact that are orthonormal together with the commutator formulas implies that the Lie derivative of along the three vector fields vanishes.
Given functions we consider the inner product
[TABLE]
Upon defining for , the following formula (known as Santaló’s formula) holds for any :
[TABLE]
where is the geodesic flow.
We now discuss the manifold and its geometry. One may define a natural frame on , given by
[TABLE]
( represents horizontal differentiation along the tangent direction). It is easily seen that and that these two vector fields are orthonormal for the metric on induced by the metric defined on . In particular is an orthonormal frame for and we may define with respect to that frame. We now prove a useful lemma that will simplify later calculations.
Lemma 5.1**.**
Let be a non-trapping surface with strictly convex boundary. Then the vector field can be completed into a global, pairwise commuting frame of . This frame is smooth on , continuous on and satisfies and .
Proof of Lemma 5.1.
For and , we define two vector fields on
[TABLE]
Since the map is smooth and injective for and , this defines global, smooth sections of , and so that pairwise commute. Via direct computation of the differential of the flow (see e.g. [28, Sec. 4.2]), one may obtain the following expressions on
[TABLE]
where satisfy
[TABLE]
and where for , one defines though the relation
[TABLE]
One further notices that the definition of extends by continuity to , with the appropriate restrictions claimed in the statement of the lemma. ∎
5.2 Forward estimates - proof of Theorem 2.2
In this section, we derive various continuity estimates for the forward map . Recall that if the boundary is strictly convex, by [39, Lemma 4.1.2 p113] there is a constant such that
[TABLE]
We start with the following basic estimates.
Lemma 5.2** (Work-horse lemma).**
Let be a non-trapping surface with strictly convex boundary and . Suppose and consider the unique continuous solution to on with . Then there exists a constant such that
[TABLE]
The constant can be chosen as , with the diameter of and the constant given in (20).
Proof.
It is easy to check that
[TABLE]
where is the unique solution to on with . Taking Frobenius norm, using -invariance and the fact that that is unitary, we get
[TABLE]
Upon bounding the right-hand side crudely by , this immediately implies (21). On to the estimates, applying Cauchy-Schwarz yields for all
[TABLE]
where is the maximal geodesic passing through . Now fix and integrate the inequality above along the geodesic flow to arrive at
[TABLE]
Multiplying both sides by , integrating w.r.t. and using Santaló’s formula yields (22).
For the estimate on , looking at (24) for and using (20), we arrive at
[TABLE]
Integrating w.r.t. and using Santaló’s formula (18) on the right hand side immediately gives (23). Lemma 5.2 is proved. ∎
We now prove the main result on forward estimates, Theorem 2.2. We shall follow the model proof of [39, Theorem 4.2.1] which shows that the standard X-ray transform maps to . We do this in two stages: first we explain in Sec. 5.2.1 the proof in the simpler case in which the matrix fields have support contained in the interior of and then we explain in Sec. 5.2.2 how to derive the general case.
5.2.1 Proof of Theorem 2.2 assuming and with support in the interior of
As a preliminary identity, given and two skew hermitian matrix fields, consider the two -valued solutions such that with boundary condition . It is immediate to find that the relation
[TABLE]
holds pointwise on , and that . Using that with estimate (21) yields
[TABLE]
Similarly, combining the observation with (23) yields (7), and we can also obtain, using (22),
[TABLE]
To prove the continuity estimate, consider the function , such that and for brevity set . The following identity is immediate:
[TABLE]
In addition, since and are compactly supported in , the functions , equal the identity matrix in a neighbourhood of and in particular, .
Using estimates (21)-(22)-(23) and -invariance of Frobenius norms gives:
[TABLE]
We also have with , so by (21), we get . Combining this fact with (25), we arrive at
[TABLE]
and a similar bound for . Obtaining a similar estimate for , we arrive at
[TABLE]
Similar arguments using sup norms everywhere yield
[TABLE]
To proceed to higher-order derivatives, if is a derivative of order , setting , we have , and
[TABLE]
where the right-hand-side involves derivatives of of order at most , and derivatives of of order at most . Combining the estimates of Lemma 5.2 and an induction on (whose formulation also involves control on for all , and where the commuting frame avoids the proliferation of terms due to non-trivial commutators) proves the theorem for higher-order derivatives. ∎
5.2.2 Proof of Theorem 2.2 and supported up to
Consider a compact non-trapping surface with strictly convex boundary and let be a matrix-valued field. We shall call an integrating factor for if is differentiable along the geodesic vector field and . Let denote the unique integrating factor with . Recall that . First note that the work of the previous section also proves for every that if and are matrix fields compactly supported inside of , we also have
[TABLE]
Let denote the scattering relation of the metric (i.e. the map that takes initial conditions of a geodesic at the moment of entry to final conditions at the moment of exit). If denotes any other integrating factor for , then it must have the form , where is the first integral (i.e. ) determined by . Thus and from this we deduce
[TABLE]
In particular, given two continuous matrix fields , , Equation (27) implies the identity on :
[TABLE]
To complete the proof of Theorem 2.2 for supported up to the boundary, we then need to construct integrating factors with good regularity on (i.e, at included) and which behave continuously in terms of and . To this end, we consider isometrically embedded in a closed manifold . The Seeley extension theorem asserts that for any there is a continuous extension map
[TABLE]
(It also works for .) We consider a slightly larger compact manifold with boundary engulfing so that stays non-trapping and with strictly convex boundary. We fix once and for all a smooth cut off function so that it has compact support in and it equals near . Thus given ,
[TABLE]
and since is continuous,
[TABLE]
Now by virtue of the work in Subsection 5.2.1 applied to on , we can deduce estimates of the form
[TABLE]
We then take as smooth integrating factors and . Combining (29) and 30 we derive
[TABLE]
Combining (29) and (26) applied to and , we obtain
[TABLE]
and similarly for , and for norms. Then the proof for Theorem 2.2 for supported up to the boundary consists in applying the product rule to (28) and using estimates (31) and (32).
5.3 Stability estimate - proof of Theorem 2.1
5.3.1 Setting, main results and proofs of Theorem 2.1 and Corollary 2.3
Before considering the non-linear inverse problem, we must establish a stability estimate for a linear inverse problem, that of reconstructing a function from its attenuated X-ray transform, where the attenuation is matrix-valued. Namely, given a smooth skew-hermitian matrix in , we define , where is the unique solution to the problem
[TABLE]
The injectivity of such a transform was proved in [35], and we now provide a stability estimate for it.
Theorem 5.3**.**
Let be a simple Riemannian surface with boundary and a smooth, skew-hermitian matrix field in . Then for any , we have the following stability estimate
[TABLE]
Remark 5.4* (Dependence of ).*
The constants only depend on the geometry of . The constant blows up like , where is the terminator constant of . This is one of the ways that this stability estimate ceases to hold as one approaches non-simplicity. The main other quantity appearing in is , the sup norm of the integrating factor defined below. The behavior of such a quantity, while finite on any simple surface, remains to be better understood.
On to the non-linear stability estimate, injectivity of the operator restricted to -valued fields was initially proved in [35], and Theorem 2.1 upgrades this result with a stability estimate. While the remaining sections will focus on the proof of Theorem 5.3, we now explain how this result implies Theorem 2.1. The main additional ingredient needed is a pseudo-linearization identity, relating scattering data to attenuated X-ray transforms:
Lemma 5.5** (Pseudo-linearization).**
Let be a non-trapping surface with strictly convex boundary. For any , the following relation holds
[TABLE]
where is an attenuated X-ray transform with matrix field , an endomorphism of with pointwise action
[TABLE]
Proof of Lemma 5.5.
With the fundamental solutions of with and (similarly for ), denote . A direct computation shows that
[TABLE]
and thus by the definition of the attenuated X-ray transform, . Since we also have by construction , identity (34) follows. ∎
Proof of Theorem 2.1.
Appealing to the pseudo-linearization (34), one may notice that if , are skew-hermitian, then the field is skew-hermitian when viewed as an endomorphism of . Moreover, since the entries of are linear in the entries of and , we directly have that
[TABLE]
with a universal constant. Then relation (34), together with Theorem 5.3 immediately implies
[TABLE]
This shows Theorem 2.1 when . Since all quantities involved above do not depend on derivatives of of order higher than , and are independent of , approximating by sequences in (and using Theorem 2.2) will yield the same stability estimate for matrix fields. ∎
We also cover the proof of Corollary 2.3, based on the previous result and the forward estimate Theorem 2.2.
Proof of Corollary 2.3.
It is enough to show that
[TABLE]
To show this, we write at the pointwise level:
[TABLE]
hence . To control first derivatives, take or , we have
[TABLE]
using triangle inequality and submultiplicativity. Squaring, taking the sup norm of and integrating on , we obtain
[TABLE]
Combining the estimates for and we arrive at
[TABLE]
Now using the forward estimate (8) with and (thus ), we deduce that
[TABLE]
This yields the estimate , and taking squareroots yields (35) (using that is uniformly bounded for ). ∎
5.3.2 Proof of Theorem 5.3 - Main outline
As in [35], the main method of proof involves an energy identity (or Pestov identity), based on integrations by parts on . To do this, let us recall that with the inner product defined in (17), and upon also denoting
[TABLE]
the following integrations by parts formulas holds for :
[TABLE]
We will also use extensively the harmonic decomposition on the fibers of . Namely, the space decomposes orthogonally as a direct sum
[TABLE]
where is the eigenspace of corresponding to the eigenvalue . A function has a Fourier series expansion
[TABLE]
where . Let . Of special interest are the operators
[TABLE]
with the property that for all . For more details on the operators and the Fourier expansion we refer to [21] where these tools were first introduced.
Definition 5.6**.**
A function is said to be holomorphic if for all . Similarly, is said to be antiholomorphic if for all .
To control the terms involving the matrix field, one must introduce an artificial connection as we will see below. This first requires that we derive a Pestov identity for X-ray transforms with connection and matrix222The matrix field is also referred to as a ’Higgs’ field in the literature. field . Namely, given a skew hermitian pair on the bundle and , we define , where is the unique solution to the problem
[TABLE]
While previous Pestov identities have been derived in [35], the present one accounts for nonzero boundary terms, and in particular reflects more precisely how the stability constant degrades as approaches non-simplicity. This is captured by the concept of terminator constant : given a simple surface , there exists a number such that for any , there exists a smooth function , solution to the Riccati type equation .
Theorem 5.7**.**
Let a simple surface with boundary, with terminator constant , and a skew-hermitian pair on the bundle . Then for any and , the following identity holds:
[TABLE]
In the identity above,
[TABLE]
and is a smooth function on which only depends on the surface. The quantity is the curvature of the connection , which upon a judicious choice of connection, can have a controlled sign. To achieve this, consider the scalar Hermitian connection , where is a smooth 1-form such that (the area form of the metric ). We choose a specific of the form for a real-valued function satisfying with Neumann condition at the boundary. The latter condition implies that for any real . Then we have
[TABLE]
with defined in (37), and .
By [35], we can construct a holomorphic scalar function satisfying . Without loss of generality, can be chosen even. The condition on reads , for which it is sufficient to use . With this choice of and , in what follows, we will denote and . With as above, we have . Moreover, (the complex-conjugate of ) is antiholomorphic and solves , so also .
Lastly, we will denote the projection onto positive and negative harmonics. Namely, . We have the following commutators formulas, for any :
[TABLE]
The following lemma will help us controlling by versions of which are conjugated by special integrating factors.
Lemma 5.8**.**
With the holomorphic function and antiholomorphic function and any , we have
[TABLE]
in particular we get the equality
[TABLE]
Proof.
We only prove , and the rest is similar. It is enough to notice that for any holomorphic function , the equality holds, as this amounts to saying that the negative harmonics of do not depend on the non-negative harmonics of . This is immediate since
[TABLE]
Then we compute immediately
[TABLE]
hence the result. ∎
Outline of proof of Theorem 5.3 At first we are going to assume that the solution to the transport problem , is . If is supported all the way to the boundary, this may not be the case, as may fail to be smooth at the glancing because is not smooth at . However, there is a standard way to fix this issue and we shall do this at the very end. For now we will proceed as if were smooth in .
The initial transport equation, projected onto the harmonic term of degree [math], reads
[TABLE]
so that, in particular,
[TABLE]
The crux is then to find how to bound the quantities on the right by the boundary values of . Using a Pestov identity with a special connection defined as above (and its holomorphic integrating factor ), we show how to control the first term using control over for . Similar work can be done, to control the second term using control over for .
We first derive in Sec. 5.3.3 the identity:
[TABLE]
Since only has strictly positive harmonic terms, the first term in the right-hand side of (42) only depends on . Upon defining , the identity (42) reads
[TABLE]
Denoting , we straightforwardly obtain the estimate
[TABLE]
and control on will be obtained after controlling each term in the last right hand side. We first control by , via the estimate
[TABLE]
We then control and by boundary terms via Pestov identity and setting up an appropriate threshold on . To do this, we consider the transport problem for , written as:
[TABLE]
We then use the Pestov identity (38) for , with and :
[TABLE]
Before choosing appropriately, we need additional work (tedious as in [35]) on the term . Taking into account boundary terms, and upon defining , we prove in Sec. 5.3.3 that
[TABLE]
with defined in (55). The last term in the sum will move to the right-hand side of (46), while the other two need to be controlled with a large . To achieve this, we prove in Sec. 5.3.3 the following:
Lemma 5.9**.**
There exists a universal constant such that for all ,
[TABLE]
In particular, for , identity (46) becomes
[TABLE]
We now explain how to bound the right-hand side in terms of . Recall that . The first claim is that . The first one is obvious because both operators are diagonal of the fiberwise Fourier decomposition . That is also diagonal on this decomposition follows from the fact that . With this in mind, we have, on :
[TABLE]
and since , and will be controlled by . The right hand side of (49) is thus bounded by , where the constant does not depend on .
Using this bound and throwing out the first and third terms of the left-hand side of (49), we obtain
[TABLE]
The second term in the left-hand side controls directly, and we can write
[TABLE]
with some constant independent of . Recalling that and combining estimates (41), (44), (45) and (50), we arrive at estimate (33), completing the proof of Theorem 5.3.
5.3.3 Remaining ingredients
Pestov identity with boundary term for ray transforms with skew-hermitian pairs
Let and a skew-hermitian pair, and define
[TABLE]
We have the following structure equations
[TABLE]
where , or when the connection is scalar, , where is the Gaussian curvature. In what follows, we will need to integrate by parts with boundary terms, and using (36), we obtain for :
[TABLE]
Proof of Theorem 5.7.
We first write a differential identity using the structure equations (51):
[TABLE]
where . We record this here as
[TABLE]
Now, considering smooth and supported up the boundary, we write
[TABLE]
We now arrange the four boundary terms using integration by parts in and the formulas
[TABLE]
First notice that
[TABLE]
We then obtain
[TABLE]
We now simplify, using that and ,
[TABLE]
The boundary terms then simplify into
[TABLE]
With this notation, the full Pestov identity takes the form
[TABLE]
To recover [35, Eq. (8)], we take the real part of the equality above, and notice that because ; then
[TABLE]
Since the last term is purely imaginary, the real parts of the other terms agree, and upon taking the real part of (53), we obtain
[TABLE]
(Note that the second boundary term is purely real so the is just ornamental)
We finally explain how the index form term can be rewritten as the sum of a non-negative term and a boundary term. With as in the statement, and the function solving , we now compute, for any
[TABLE]
We simplify
[TABLE]
We arrive at
[TABLE]
and we may rearrange this as
[TABLE]
Plugging this last relation into (54) with yields (38). ∎
Remaining estimates and lemmata
Proof of equality (42).
We write, using Lemma 5.8
[TABLE]
To rewrite the last term, from the equation , note the relation
[TABLE]
Then we have, for ,
[TABLE]
where we used the transport equation in the last line. For ,
[TABLE]
Plugging this back into the equation for , we get
[TABLE]
We now write
[TABLE]
and similarly
[TABLE]
Using the last two computations, we arrive at (42). ∎
Proof of estimate (45).
The transport equation for projected onto the harmonic term of degree reads:
[TABLE]
For our choice of connection, so the left side can be rewritten as
[TABLE]
hence we obtain
[TABLE]
We then rewrite the latter right-hand side in terms of . Notice that
[TABLE]
so
[TABLE]
and thus
[TABLE]
Upon deriving an estimate of the form
[TABLE]
we can write
[TABLE]
and (45) follows. ∎
Proof of (47).
We first need to write an integration by parts for defined in (37). Using integrations by parts (36) we first derive an integration by parts for : for any smooth on ,
[TABLE]
We now compute, using that
[TABLE]
where we define
[TABLE]
Similarly, for the skew-hermitian connection considered,
[TABLE]
Now, using the fact that
[TABLE]
we compute
[TABLE]
where . Upon defining
[TABLE]
we now prove by induction the following claim:
[TABLE]
The case is proved above, and the induction step follows from the calculation
[TABLE]
Putting this equality back into (57) proves the induction. Now since , we have that , and thus (47) follows. ∎
Proof of Lemma 5.9.
The term that ultimately controls everything is
[TABLE]
The infinite sum in (48) can then be controlled by
[TABLE]
with a universal constant. As for the last term of the left-hand side of (48), we write
[TABLE]
where is a universal constant. Lemma 5.9 follows upon taking . ∎
5.3.4 Conclusion: dealing with the glancing
Consider a function such that it coincides with in a neighbourhood of and such that and . Clearly for . Using , we extend to the interior of as for . We let and
[TABLE]
Note that is now defined on all and agrees with the vector field defined previously on . In fact and are tangent to every , where . The next lemma for is the key input to deal with the glancing, cf. [39, Lemma 4.1.3], [40, Lemma 3.2.3] and [8, Lemma 5.1].
Lemma 5.10**.**
The functions and are bounded on .
To substantiate the previous claim that the behaviour of is the same as that of we proceed as follows. We consider a smooth integrating factor such that . These always exist for any non-trapping manifold with strictly convex boundary. A simple calculation shows that we may write in terms of as
[TABLE]
where is the geodesic flow of . Thus directly from Lemma 5.10 we obtain:
Lemma 5.11**.**
The functions and are bounded on .
Next we note that all the previous work that we have done assuming smooth may be summarized as follows:
Theorem 5.12**.**
Let be a simple Riemannian surface with boundary and a smooth, skew-hermitian matrix field in . Then for any , we have the following stability estimate
[TABLE]
where is any smooth solution of .
Proof of Theorem 5.3 in full generality.
Let for small be the surface considered above. We let be the unique solution to the problem
[TABLE]
The function is smooth in and solves since does. Hence we may apply Theorem 5.12 in to obtain
[TABLE]
where we might as well use the constants for which bound those for . We now let ; we clearly have
[TABLE]
and using Lemma 5.11 we see that
[TABLE]
Since
[TABLE]
the theorem is proved.
∎
5.4 Consistency of the posterior mean: proof of Theorem 3.2
We assume , the general case requires only notational changes.
The overall strategy we pursue here, which has also been used in some form in [45, 31, 30, 32], is to show first that the Bayesian algorithm recovers the ‘regression function’ consistently in a natural statistical distance function, and to combine this with quantitative stability estimates for the inverse map in appropriate metrics. This exploits crucially that the estimated Bayesian regression outputs lie in the (non-linearly constrained) range of the forward map , so that the stability estimate applies to them. To make this approach work with ‘unbounded’ Gaussian priors is challenging, and our proofs proceed as follows: We first establish the posterior contraction Theorem 5.13 under general conditions, borrowing from Bayesian nonparametric theory (e.g., [17, Theorem 8.19] or [19, Theorem 7.3.3]), slightly strengthening the usual statement of such theorems to give explicit exponential bounds for the convergence rate to zero of certain posterior probabilities. Since our regression functions take values in , they are uniformly bounded and the usual Hellinger distance occurring in such contraction theorems is then Lipschitz-equivalent to the standard -distance (see Lemma 5.14). Then Lemma 5.16 uses results of [27] to show that the key small ball condition in Theorem 5.13 can be verified for the Gaussian priors from Condition 3.1 even after they have been shrunk towards zero, if the true matrix field belongs to the RKHS . Next, Lemma 5.17 exploits fine properties [4, 16] of infinite-dimensional Gaussian measures to show that such ‘shrunk’ priors charge ‘sufficiently regular’ matrix fields (effectively -balls) with probability close enough to one that the posterior distributions inherits these regularity properties. This is crucial to apply the ‘forward’ estimate Theorem 2.2 and the ‘stability’ estimate (9) in the proof of Theorem 5.19 – effectively the specific structure of our inverse problem enters only in this theorem and only through these two estimates. Finally, the exponential convergence to zero of the order obtained in Theorem 5.19 permits a ‘quantitative uniform integrability argument’ in Section 5.4.5 to deduce convergence of the whole posterior (Bochner-) mean towards the true matrix field .
Let us mention that in the recent contributions [1, 20] (written after the first version of this manuscript was completed), the general proof template developed here has already been used effectively in two different non-linear inverse problems (arising with elliptic PDEs), see also Remark 3.6.
5.4.1 A general contraction theorem
Consider a collection of probability density functions on some measurable space with respect to a dominating measure , specifically in our measurement model (3) we take
[TABLE]
where is equipped with its natural product Borel- algebra , where with equal to Lebesgue measure on and given in (5). By the Gaussianity of the ’s these probability densities are of the form
[TABLE]
Since the map is jointly Borel-measurable from to (using (8) and that point-evaluation is -continuous), the posterior distribution (11) exists by standard arguments ([17], p.7) and has the desired form. In the proof of the following theorem we show in particular that the marginal density is positive on events of -probability approaching one, so that (11) is well-defined also in the frequentist setting where . We also define the Hellinger distance on such densities by
[TABLE]
Denote by the minimal number of Hellinger-balls of radius required to cover a set of -densities on . We then have the following
Theorem 5.13**.**
Consider a prior for arising from a sequence of Borel probability measures on and let be the posterior distribution arising from i.i.d. observations . Let , let be a sequence such as , and define sets
[TABLE]
Suppose for some constant the prior satisfies for all large enough
[TABLE]
and that for some sequence of approximating sets for which
[TABLE]
we have the complexity bound
[TABLE]
for some fixed constant . Then for some large enough constant
[TABLE]
Proof.
Recall from (4) that we write . The proof proceeds as in the proof of [19, Theorems 7.3.1 and 7.3.3]: We first use [19, Lemma 7.3.2] and the hypothesis (61) to deduce that the events
[TABLE]
satisfy as . Moreover using (63) and [19, Theorem 7.1.4] with choices , any , and constant in , we deduce that for every there exists large enough such that we can find ‘tests’ (random indicator functions) for which
[TABLE]
Now let us write
[TABLE]
for the event whose posterior probability we want to bound. Then by (11) and as ,
[TABLE]
By Markov’s inequality, decomposing
[TABLE]
and using Fubini’s theorem as well as
[TABLE]
we further bound the last probability as
[TABLE]
where we have used (62) and (66) with and then large enough. ∎
The ‘information-theoretic distance’ arises naturally in such posterior contraction theorems, see [17]. The following lemma, which adapts a result due to Birgé [2] to the setting of -valued functions, shows that the Hellinger distance is Lipschitz equivalent to the standard -metric
[TABLE]
Lemma 5.14**.**
For , let be its non-Abelian -ray transform. Then there exist positive constants such that
[TABLE]
Proof.
Write
[TABLE]
for the ‘Hellinger affinity’. By (58) and using the standard formula for the moment generating function of -variables, the quantity equals
[TABLE]
By Jensen’s inequality the last integral is greater than or equal to and using standard inequalities for the right hand side of (68) follows. Next we notice that since , their matrix entries are all bounded by one and we hence have for some constant . We can thus proceed exactly as in the proof of [2, Proposition 1] (or see Lemma 22 in [20]) to also deduce the left hand side inequality in (68). ∎
5.4.2 Verification of the prior mass condition
We now verify condition (61) in the last theorem for an explicit constant and the Gaussian prior from Theorem 3.2. To do this we first show that one can reduce to checking small ball conditions for -norms on the level of the original matrix parameter .
Lemma 5.15**.**
For and define
[TABLE]
and let be as in Theorem 5.13. Then for some large enough we have and thus in particular, for every ,
[TABLE]
Proof.
From (3) with and (58) we have
[TABLE]
Therefore, since and is the unit volume measure on ,
[TABLE]
where we have used the forward estimate (7). Thus if the first inequality defining is verified for . To verify the second, note that all are bounded in -norm by some fixed constant . Thus
[TABLE]
for some constant , where we have also used that implies, for fixed,
[TABLE]
and again (7), so that the overall result follows from appropriate choice of ∎
We now turn to lower bound the small ball probabilities for the prior featuring in Theorem 3.2 where for the given we will choose
[TABLE]
Note that precisely equals the rescaling of the prior in (13). Let us recall the base RKHS from Condition 3.1.
Lemma 5.16**.**
Let be the prior for from Theorem 3.2 with , assume and choose as in (70). Let be as in Lemma 5.15. Then for every there exists a constant such that for every ,
[TABLE]
In particular, for as in (60) in Theorem 5.13, there exists a finite constant
[TABLE]
such that for every ,
[TABLE]
Proof.
Since , to prove the first inequality it suffices to lower bound, by independence of the ’s,
[TABLE]
The sets are convex and symmetric, hence by [19, Corollary 2.6.18] we have for every fixed,
[TABLE]
where we have used that
[TABLE]
in view of (13), (70), (and where we refer to [19, Exercise 2.6.5] or [17, Lemma I.16] for standard preservation properties of RKHS under linear transformations).
We next bound the centred probability which by (13), (70) equals
[TABLE]
By Condition 3.1 the RKHS of the Gaussian law of in is continuously imbedded into . The unit ball of this space satisfies the bound
[TABLE]
for its -covering numbers: Indeed, since the simple surface is diffeo-morphic to a disk, we can extend all functions in to elements of the Sobolev space on the -torus , with Sobolev-norm increased by at most a fixed multiplicative constant (Ch.4 in [42]). An appropriate bound for the -covering numbers of is then provided in [19, (4.184)], which in turn (since for all ) also bounds the -covering numbers of as required.
To proceed, we can now use (70) and [27, Theorem 1.2] (with the value of there equal to our ) to lower bound the last small ball probability by
[TABLE]
for constants and since . Combining what precedes proves the first inequality of the lemma with
[TABLE]
The second inequality (71) now follows from the first and Lemma 5.15. ∎
We note that the proof in fact shows that the constant depends only on upper bounds for .
5.4.3 Excess mass and complexity condition
Having determined the constant in (61) for the Gaussian prior in Theorem 3.2, we now turn to verifying the remaining conditions (62) and (63) in Theorem 5.13 for a suitable choice of that will provide sufficient regularity of the posterior distribution to combine it with our stability estimates for the map .
Lemma 5.17**.**
Let be the prior from Theorem 3.2 with , let be as in (70) and assume . For define subsets of as
[TABLE]
a) Then for every we can choose large enough such that
[TABLE]
b) Moreover for some we have
[TABLE]
Proof.
a) Recalling (13), (70), we can identify a prior draw with the vector field
[TABLE]
We denote by the product measure describing the law of the centred Gaussian random variable in the Banach space .
Write next where, with corresponding to ,
[TABLE]
so that it suffices to bound the prior probabilities of the complements of .
We first turn to . By Condition 3.1 the vector field defines a Gaussian Borel random variable in a separable linear subspace of . By the Hahn-Banach theorem its -norm can then be represented as a countable supremum
[TABLE]
of bounded linear real functionals defined on . We then apply a version of Fernique’s theorem [16], concretely [19, Theorem 2.1.20], to the centred Gaussian process to deduce that for some fixed constant ,
[TABLE]
and then also, for large enough and since ,
[TABLE]
for a fixed constant, which can be made less than for any provided is chosen large enough.
It remains to show that for large enough. Using the continuous imbedding with imbedding constant (cf. Condition 3.1), it suffices to lower bound
[TABLE]
where is the unit ball in and where we define
[TABLE]
By Borell’s [4] isoperimetric inequality (see [19, Theorem 2.6.12]) the last probability is bounded below by
[TABLE]
where is the cumulative distribution function of a random variable . By the same arguments as those leading to (73) above, we have
[TABLE]
and using the basic inequality (see [17, Lemma K.6]) and monotonicity of we can further lower bound (75) by
[TABLE]
Now given define
[TABLE]
which by the previous inequality for can be made less than or equal to whenever is large enough. Conclude that the penultimate display is lower bounded by
[TABLE]
completing the proof of Part a).
b) To prove Part b), note first that to construct a -covering of in -distance it suffices, by definition of , to construct such a covering for a -ball of radius , so that (72) and the definition of give (with )
[TABLE]
Lemma 5.14 and (7) imply that such a covering induces a -covering of in the Hellinger distance of log-cardinality at most . Since is a norm and hence homogeneous, we can increase the constant from to in (76) and obtain a -covering for . The desired inequality in Part b) follows. ∎
Remark 5.18*.*
We note that the introduction of the set and the use of Borell’s inequality in the previous Lemma could be avoided if one wishes to prove Theorem 3.2 only for any (in this case a minor adaptation of Theorem 5.13 and of (76) can be shown to give a slightly worse rate in (77) below). We give this argument however to obtain our sharper bound for in (81).
5.4.4 Final contraction theorem
We now put everything together to establish a posterior contraction theorem for and subsequently deduce Theorem 3.2 from it.
Theorem 5.19**.**
Under the hypotheses of Theorem 3.2, with and from (71), we have for all large enough that
[TABLE]
as . Moreover, if then we have for every integer such that and all large enough,
[TABLE]
Remark 5.20*.*
The constraint in the second limit in Theorem 5.19 is only required to allow space for an integer in the following proof, when combining the interpolation inequality (79) with Theorem 2.2 for . If a version of Theorem 2.2 were established for non-integer then and real would be permitted in Theorem 5.19 (and then also in Theorem 3.2).
Proof.
From Lemmata 5.16 and 5.17 with and Theorem 5.13 we deduce for large enough, and as
[TABLE]
Applying Lemma 5.14 gives the first limit (77) with .
To prove the second limit we will apply the stability estimate Theorem 2.1 in the form (9) with . By hypothesis we have ; as a consequence for all contained in the event in (77) with , the constants from (6) are uniformly bounded by a fixed constant that depends on and hence for those ’s
[TABLE]
To proceed we will need a standard interpolation result for Sobolev spaces on the manifold to the effect that
[TABLE]
for all and any . [For real-valued functions this can be proved using standard arguments from Ch.4 in [42] and these results extend to matrix-fields in a straightforward way.] Moreover we will use the basic inequality
[TABLE]
for all . Now Theorem 2.2 implies that for all ’s in the event in (77) the corresponding ’s are uniformly bounded by a fixed constant that depends on only. Likewise
[TABLE]
in view of Theorem 2.2 and since for by hypothesis. Hence for such ’s the combination of (78) and (79) with gives
[TABLE]
The second conclusion of Theorem 5.19 now follows from the preceding inequalities and (77).
5.4.5 Completion of the proof of Theorem 3.2
The last step is to show that the posterior contraction rate in the second limit of Theorem 5.19 carries over to the posterior mean . For any integer and every
[TABLE]
we have as
[TABLE]
Then by the inequalities of Jensen and Cauchy-Schwarz
[TABLE]
and it suffices to show that the second summand is stochastically as .
Arguing as in the proof of Theorem 5.13 and using Lemma 5.16 implies that the sets from (65) with from (71) satisfy as . Now Theorem 5.19, (11) and Markov’s inequality imply
[TABLE]
where we have also used Fubini’s theorem, (67), and that the Gaussian measure is supported in and hence integrates to a finite constant (see, e.g., [19, Exercise 2.1.5]). ∎
\ack
We would like to thank the referee for helpful remarks and suggestions. We are further very grateful to Bill Lionheart for having introduced us to polarimetric neutron tomography and its connection to the non-abelian X-ray transform. FM was supported by NSF grant DMS-1814104 and a UC Hellman Fellowship. RN was supported by the European Research Council under ERC grant No. 647812 (UQMSI). GPP thanks the University of California at Santa Cruz and the University of Washington for hospitality while this work was in progress. GPP was supported by the Leverhulme trust and EPSRC grant EP/R001898/1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Abraham, K.; Nickl, R. On statistical Caldéron problems. Math. Stat. Learn. (2020), to appear
- 2[2] Birgé, L. Model selection in Gaussian regression with random design. Bernoulli 10 (2004), no. 6, 1039-1051.
- 3[3] Briol, F.X.; Oates, C.; Girolami, M.; Osborne, M.; Sejdinovic, D. Probabilistic integration: A role in statistical computation? Statist. Sci. 34 (2019), no. 1, 1-22.
- 4[4] Borell, C. The Brunn-Minkowski inequality in Gauss space. Invent. Math. 30 (1975), no. 2, 207-216.
- 5[5] Castillo, I.; Nickl, R. Nonparametric Bernstein-von Mises theorems in Gaussian white noise. Ann. Statist. 41 (2013), no. 4, 1999–2028.
- 6[6] Castillo, I.; Nickl, R. On the Bernstein-von Mises phenomenon for non-parametric Bayes procedures. Ann. Statist. 42 2014, no. 5, 1941–1969.
- 7[7] Cotter, S.L.; Roberts, G.O.; Stuart, A.M.; White, D. MCMC methods for functions: modifying old algorithms to make them faster. Statist. Sci. 28 (2013), no. 3, 424–446.
- 8[8] Dairbekov, N.S.; Paternain, G.P.; Stefanov, P; Uhlmann, G. The boundary rigidity problem in the presence of a magnetic field. Adv. Math. 216 (2007), no. 2, 535–609.
