The empirical process of residuals from an inverse regression
Tim Kutta, Nicolai Bissantz, Justin Chown, Holger Dette

TL;DR
This paper studies an inverse regression model using Radon transformation for medical imaging, proposing a spectral cut-off estimator and analyzing the residuals' empirical process, which follows a functional central limit theorem.
Contribution
It introduces a series estimator based on spectral cut-off for inverse regression and establishes the asymptotic behavior of residuals in this context.
Findings
Residuals' empirical process satisfies a functional central limit theorem.
Proposes a spectral cut-off series estimator for inverse Radon regression.
Provides theoretical foundation for residual analysis in medical imaging models.
Abstract
In this paper we investigate an indirect regression model characterized by the Radon transformation. This model is useful for recovery of medical images obtained by computed tomography scans. The indirect regression function is estimated using a series estimator motivated by a spectral cut-off technique. Further, we investigate the empirical process of residuals from this regression, and show that it satsifies a functional central limit theorem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The empirical process of residuals from an inverse regression
Tim Kutta, Nicolai Bissantz, Justin Chown and Holger Dette
Abstract.
In this paper we investigate an indirect regression model characterized by the Radon transformation. This model is useful for recovery of medical images obtained by computed tomography scans. The indirect regression function is estimated using a series estimator motivated by a spectral cut-off technique. Further, we investigate the empirical process of residuals from this regression, and show that it satsifies a functional central limit theorem.
Keywords: Indirect regression model, inverse problems, Radon transform, empirical process AMS Subject Classification: 62G08, 62G30, 15A29
1. Introduction
Computed tomography (CT) is a noninvasive imaging technique, which is a key method for medical diagnoses. CT is based on measuring the intensity losses of X-rays sent through a body. From these measurements an attenuation profile can be recovered that provides an image of the body’s (unobservable) interior. The X-rays are linear and so the scanner rotates to create a two-dimensional slice. Insight into three-dimensional structures is obtained by considering multiple slices. Our investigation is limited to a statistical analysis of data gathered from a single slice. For this purpose we introduce the inverse regression model
[TABLE]
where are independent and identically distributed random variables with . Here is a given index set, with each index corresponding to an X-ray path and the design point characterizing this path with associated response . Consequently, can be written using coordinates as the distance from the origin and as the angle of inclination. The body’s (true) attenuation profile along the slice is represented by , a function supported on the unit disc. is a linear operator acting on and denotes the normalized Radon transform, i.e. for and ,
[TABLE]
Details on the underlying physics and applications of CT can be found in Buzug (2008).
Image reconstruction in CT is a particular case of the broad class of linear inverse problems. An overview of the mathematical aspects of these problems and methods to solving them can be found in the monographs of Natterer (1986), Engl et al. (1996) and Helgason (2011). Other examples of linear inverse problems are the heat equation and convolution transforms (see Mair and Ruymgaart (1996), Saitoh (1997), and Cavalier (2008), among others). Additional statistical inverse problems include errors-in-variables models and the Berkson error model (see, for example, Bissantz et al. (2007), Carroll et al. (2007), Koul and Song (2008, 2009), Bertero et al. (2009), Kaipio and Somersalo (2010), Delaigle et al. (2014), and Kato and Sasaki (2017)). The Radon transform is usually discussed in the contexts of positron emission tomograpy (PET) and CT in medical imaging. In the case of PET, lines-of-sight are observed along which emissions have occurred. However, the positions of the emissions on these lines are unknown. Here the aim is to reconstruct the emission density (see Johnstone and Silverman (1990), Korostelev and Tsybakov (1993), and Cavalier (2000), among others). On the other hand, CT leads to the inverse regression (1.1) (see, for example, Cavalier (1999) and Kerkyacharian et al. (2010); Kerkyacharian et al. (2012)).
We contribute to this discussion by deriving the rate of uniform, strong consistency for a nonparametric estimator of the unknown function based on the popular spectral cutoff method. Further, we derive a functional central limit theorem for the empirical process of the resulting model residuals , i.e. we investigate the estimator
[TABLE]
where the nonnegative weights sum to (see Section 3). Statistical applications of results of this type include validation of model assumptions. In the context of inverse regression models, to the best of our knowledge only one result is available: Bissantz et al. (2018), who study an inverse regression model characterized by a convolution transformation.
In direct regression problems, residual-based empirical processes arising from non- and semiparametric regression estimators have been considered by numerous authors (see Akritas and van Keilegom (2001), Neumeyer (2009), Müller et al. (2012), Colling and Van Keilegom (2016), and Zhang et al. (2018), among others). Dette et al. (2007) consider tests for a parametric form of the variance function in a heteroscedastic nonparametric regression by comparing the empirical distribution function of standardized residuals calculated under a null model to that of an alternative model. Neumeyer and Keilegom (2010) work with a similar approach as the previous authors to propose tests for verifying convenient forms of the regression function. Khmaladze and Koul (2009) introduce a popular distribution free approach to addressing goodness-of-fit problems for the errors from a nonparametric regression, where these authors introduce a transformation of the empirical distribution function of residuals that is useful for forming test statistics with convenient limit distributions. All of these approaches to validating model assumptions crucially rely on a technical asymptotic linearity property of the residual-based empirical distribution function. We show the estimator (1.3) shares this property as well, and the results of this article can be used immediately in approaches to validating model assumptions in the inverse regression model (1.1) that are in the same spirit as the previously mentioned works.
We have organized the remaining parts of the paper as follows. Model (1.1) is further discussed and we introduce the estimator in Section 2. Our main results are given in Section 3. All of the proofs of our results and additional supporting technical details may be found in the appendices.
2. Estimation in the indirect regression model
In this section we give more details regarding the Radon transform model (1.1) and introduce an estimator of the function .
2.1. The Radon transform
Following Johnstone and Silverman (1990) let
[TABLE]
denote the unit disc, which is the two dimensional domain of the investigated attenuation profile and is called brain space for historical reasons. It is equipped with the uniform distribution, given in polar coordinates by
[TABLE]
This means that no prior emphasis on any region of the scanned area is given. The detector space is defined as
[TABLE]
with corresponding probability measure
[TABLE]
The domain of the transformed image is , a parametrization of all lines (X-ray paths) crossing the unit disc. It is usally referred to as detector space. is a probability measure on adapted to the length of the line segments inside the disc. For analytic simplicity we allow the angles in and to be exactly [math] and . This is possible since the below required smoothness of and entail periodicity with respect to the angular coordinates.
The Radon transform in (1.2) defines a linear operator from to . Identifying corresponding equivalence classes it can be shown that is one-to-one, compact and permits a singular value decomposition (SVD). The SVD of is vital for our subsequent investigations. To state it efficiently we introduce some definitions borrowed from Johnstone and Silverman (1990) and Born and Wolf (1970). Let
[TABLE]
be and index set and define for the function
[TABLE]
where
[TABLE]
is the so called radial polynomial. Finally for we define
[TABLE]
where denotes the ths Chebyshev polynomial of the second kind. For convenience of notation we also define and for . Both collections of functions,
[TABLE]
form orthonormal bases of the spaces and respectively. With these notations the SVD of for some is given by
[TABLE]
In the literature the functions are commonly referred to as Zernike polynomials, which play an important role in the analysis of optical systems, for instance in the modelling of refraction errors, c.f. Zernike (1934) and more recently Lakshminarayanan and Fleck (2011). We refer to Deans (1983) for more details on the cited SVD of the normalized Radon transform. Due to injectivity of the operator we can immediately access its inverse pointwise defined for some , as
[TABLE]
The identities (2.7), (2.8) as well as -expansions in the respective spaces apply a priori almost everywhere. However if is sufficiently smooth they even hold uniformly. In order to specify the required regularity we define
[TABLE]
the smoothness class. We assume throughout this paper that the regression function in model (1.1) is an element of (for some ). Controlling smoothness and thereby the complexity of the class of regression functions by related conditions is common in inverse problems. This is owed to their natural correspondence to singular value decompositions of operators and their suitability to prove minimax optimal rates (see for example Mair and Ruymgaart (1996), Cavalier and Tsybakov (2002), Bissantz and Holzmann (2013) or Blanchard and Mücke (2018)).
Proposition 2.1**.**
Suppose that with , then the following four identities hold everywhere:
[TABLE]
Moreover the functions and are times continuously differentiable.
The equality of and its -expansion is vital when proving uniform bounds on the distance between and . In one dimensional convolution type problems this is usually dealt with by the Dirichlet conditions that directly apply to classes of smooth functions (see Nawab et al. (1996) pp. 197-198). It should also be noted that the series condition on the function in (2.9) implies regularity properties beyond mere smoothness. For instance, if it also entails periodicity of and its continuous derivatives in the angular component up to the order . This property follows by periodicity of the basis functions in the angle and is an analogue to periodicity of convergent Fourier series on bounded intervals. Notice that it fits naturally to the scanning regime, since any function transformed from Cartesian into spherical coordinates will comply to periodicity with respect to the angle.
2.2. Design
As common in computed tomography we will assume a parallel scanning procedure, corresponding to a grid of design points on the detector space. Adopting our results to fan beam geometry, which underlies most modern scanners, is then mathematically simple.
We thus define a grid on the detector space , where for given each of the constituting rectangles has side length in -direction and in -direction. More formally, we define an index set
[TABLE]
and decompose the detector space in rectangular boxes of the form
[TABLE]
where . The design points are then defined as follows. The second coordinate of is given by
[TABLE]
and the first coordinate is determined as the solution of the equation
[TABLE]
Throughout this paper we consider the inverse regression model (1.1) with these design points. The non-uniform design in radial direction defined by (2.14) is motivated by a midpoint rule to numerically integrate over each box, with respect to the measure in (2.4). For asymptotic considerations, we assume that and that depends on as follows:
Assumption 2.2**.**
There exist constants , , such that for all .
Denoting the number of rows and columns in the grid of design points by and respectively is common in the literature and numerical programming. Notice that our Assumption 2.2 leaves room for the resolution optimal choice (see Natterer and Wübbelling (2001), p. 74). Sometimes we will use the notation , actually meaning that according to Assumption 2.2 and thereby and diverge. Note also that the index set depends on the sample size in model (1.1). Thus formally we consider a triangular array of independent, identically distributed and centred random variables , but we do not reflect this dependence on in our notation.
2.3. The spectral cutoff estimator
Motivated by the representation (2.13) we now define the cutoff estimator for the function in model (1.1) by
[TABLE]
Here
[TABLE]
is an estimator of the inner product
[TABLE]
and denotes the Lebesgue measure of the cell . Comparing (2.13) to our estimator in (2.15), we observe that the inner products have been replaced by the estimates (2.16). Furthermore the series has been truncated at , which represents the application of a regularized inverse. In the literature it is common to refer to either or as bandwidth, since it is used to balance between bias and variance like the bandwidth in kernel density estimation (see Cavalier (2008)).
The choice of a bandwidth is a non-trivial problem. An optimal bandwidth with respect to some criterion such as the integrated mean squared error will depend on the unknown regression function . Several data driven selection criteria for the choice of have been proposed and examined in the literature. We refer to the monograph of Vogel (2002), where multiple techniques are gathered. More closely related to our case is the risk hull method by Cavalier and Golubev (2006) in the white noise model and the smooth bootstrap examined by Bissantz et al. (2018) in a different context.
Remark 1*.*
It should be noticed that in practice a smooth dampening of high frequencies usually shows a better performance than the strict spectral cutoff. We can accommodate this by introducing a smooth version of the estimator in (2.15). For this purpose let denote a function with compact support and define
[TABLE]
as an alternative estimator of . Note that the estimate in (2.15) is obtained for . All results presented in this paper remain valid for the estimator (2.18). However, for sake of brevity and a transparent presentation the subsequent discussion is restricted to the spectral cutoff estimator in (2.15).
3. The empirical process of residuals
In this section we investigate the asymptotic properties of the empirical residual process
[TABLE]
where denotes the residual distribution function and
[TABLE]
the th residual obtained from the estimate . The weights are defined in Section 2.3. We begin by showing a uniform convergence result for . For this purpose we derive uniform approximation rates for bias and variance and subsequently balance these two, to get optimal results. The proofs of the following results are complicated and therefore deferred to the Appendix.
Lemma 3.1**.**
Suppose that Assumption 2.2 holds and that for some . Then the estimator in (2.15) satisfies
[TABLE]
where .
Next we derive a uniform bound for the random error of the estimator . (1.1).
Lemma 3.2**.**
Suppose that Assumption 2.2 holds and that for some . Additionally let the sequence satisfy . Then the estimator in (2.15) satisfies
[TABLE]
Balancing the two upper bounds for the deterministic and random part yields an optimal choice of the bandwidth. More precisely for the choice
[TABLE]
balances the upper bound from Lemma 3.2 with the leading term of the bias from Lemma 3.1. Combining these results yields the first part of the following theorem.
Theorem 3.3**.**
Let Assumption 2.2 hold, suppose that for some and that for some . Additionally let be chosen as in (3.2). Then
[TABLE]
and for all
[TABLE]
By the same techniques uniform bounds can be deduced for the derivatives of our estimators.
Corollary 3.4**.**
Let the assumptions of Theorem 3.3 hold, let be of order and suppose for some . Additionally let , such that . Then
[TABLE]
In order to prove the weak convergence of the process we consider the bracketing metric entropy of the subclass
[TABLE]
for some . Theorem 3.3 implies that for all the difference eventually lies in . As we know from Proposition 2.1 the condition
[TABLE]
entails that a function is smooth to a degree determined by . This implies that a finite-dimensional representation can be used as an adequate approximation of , in our case a truncated -expansion. We employ these considerations to derive the following result about the complexity of the class , which is of independent interest and is proven in Appendix B (see section B.4).
Proposition 3.5**.**
Let , then for any and sufficiently small
[TABLE]
* denotes the minimal number of -brackets with respect to needed to cover the smoothness class .*
For the next step recall the definition of the estimated residuals in (3.1), as well as the estimate for the residual distribution function in (1.3). In order to prove a uniform CLT for we disentangle the dependencies of the terms in in the next result.
Theorem 3.6**.**
Assume that for some , for some , that admits a Hölder continuous density with exponent and that Assumption 2.2 holds. If the bandwidth satisfies (3.2), then
[TABLE]
Corollary 3.7**.**
Under the assumptions of Theorem 3.6, the process
[TABLE]
converges weakly to a mean zero Gaussian process with covariance function
[TABLE]
Acknowledgements This work has been supported in part by the Collaborative Research Center “Statistical modeling of nonlinear dynamic processes” (SFB 823, Project C1) of the German Research Foundation (DFG) and the Bundesministerium für Bildung und Forschung through the project “MED4D: Dynamic medical imaging: Modeling and analysis of medical data for improved diagnosis, supervision and drug development”.
Appendix A Proofs and technical details
Throughout our calculations will denote a positive constant, which may differ from line to line. The dependence of on other parameters will be highlighted in the specific context.
A.1. Proof of Lemma 3.1
We begin with an auxiliary result which provides an approximation rate for Lemma 3.1 in expectation of for and is proven in Appendix B (see section B.3).
Proposition A.1**.**
Suppose that for and that Assumption 2.2 holds. Then for all it follows that
[TABLE]
where is some constant depending on and (the constant from Assumption 2.2).
We are now in a position to derive the decay rate of the bias postulated in Lemma 3.1. The decay rate naturally splits up into two parts. One accounts for the average approximation error of Radon coefficients with index smaller than and the other for the error due to frequency limitation of the estimator.
The singular value decomposition of the normalized Radon transform in (2.12) and the definition of our estimator (in (2.15)) yield
[TABLE]
where the terms and are given by
[TABLE]
For the term it follows that
[TABLE]
where we have used that Proposition B.2 in Appendix B implies the estimate
[TABLE]
in the first and the approximation result from Lemma A.1 in the second inequality. Similarly we have
[TABLE]
In the last step we have used that complies to the smoothness condition of (see (2.9)) and thus the series converges. ∎
A.2. Proof of Lemma 3.2
We first rewrite employing (2.16) and (A.2)
[TABLE]
We proceed deriving an upper bound for the maximum. For this purpose we introduce a truncation parameter and define the truncated error
[TABLE]
We will now show that all of the errors with eventually equal their truncated versions almost surely. Via Markov’s inequality we conclude that
[TABLE]
and therewith it follows that
[TABLE]
Recalling that and that there exists some such that by Assumption 2.2, we derive
[TABLE]
Summability is entailed by . The Borel-Cantelli Lemma implies that almost surely eventually all measurement errors and their truncated versions are equal. Thus we can confine ourselves to the maximum
[TABLE]
where and are defined by
[TABLE]
Using the inequality
[TABLE]
which is a consequence of Proposition B.2, it follows that
[TABLE]
wherewe exploit the decay rate in the last estimate . For the proof of this fact we recall the notation (A.3) and note that the condition implies
[TABLE]
For the term we note that for a fixed constant
[TABLE]
Due to truncation is bounded by and its variance by . Furthermore the weights are uniformly of order , since
[TABLE]
Consequently the Bernstein inequality yields for the right side of (A.2) the upper bound
[TABLE]
which is summable for sufficiently large . The Borel-Cantelli Lemma therefore implies that
[TABLE]
Combining these estimates we see, that the left side of (A.4) is almost surely of order
. Consequently the right side of (A.2) is of order almost surely, which proves the assertion. ∎
A.3. Proof of Theorem 3.3
Combining Lemma 3.1 and Lemma 3.2 yields the first part of Theorem 3.3, when the truncation parameter is chosen as in (3.2). For the proof of the second property we note the identity
[TABLE]
which gives for the left hand side of (3.4)
[TABLE]
The terms , and are defined as follows:
[TABLE]
By Proposition A.1 we receive the upper bound
[TABLE]
For the second sum on right of (A.3) we use the estimate
[TABLE]
In the last equality we have used the following bound established in the proof of Lemma 3.2:
[TABLE]
The third term in (A.3) can be bounded by
[TABLE]
Due to the smoothness condition in (2.9) the double sum is finite. Since it follows that the series converges to [math] for . Consequently
[TABLE]
and the definition of in (3.2) yield the desired result. ∎
A.4. Proof of Theorem 3.6
Proposition 3.5 is used to verify an equicontinuity argument, which is the central building block in the proof of Theorem 3.6. For this purpose we define the -bracketing number as follows:
Definition A.2
Let be stochastic processes, indexed in , and . The -bracketing number of , denoted by , is the minimal number of sets in a partition of such that for each
[TABLE]
Lemma A.3**.**
Define
[TABLE]
Then, under the assumptions of Theorem 3.6 it follows that .
Proof of Lemma A.1.
Using the definition of the estimated residuals in (3.1) we have
[TABLE]
As we have seen in Theorem 3.3, the random function is eventually included in the smoothness class for every . Since by assumption, we can also choose a . Since is a complicated object, depending on all residuals, we replace it by general functions in and prove a uniform result over . We thus define the stochastic processes
[TABLE]
indexed in the space , equipped with the semi metric
[TABLE]
Notice that for to be a semimetric the error density must have support , which is assumed at this point for the sake of simplicity. Furthermore recall the uniform order of the product . To prove equicontinuity we have to show that for every sequence and every
[TABLE]
If (A.11) holds, then the assertion of Lemma A.3 can be shown as follows: Firstly note that we can derive a lower bound for the probability on the left hand side of (A.11) by
[TABLE]
By the second part of Theorem 3.3 we know that
[TABLE]
for . Furthermore we notice that for all continuous function , which follows immediately from the definition of the Radon transform in (1.2). Combining this, with the upper bound
[TABLE]
from the first part of Theorem 3.3 yields
[TABLE]
such that for a sequence , say e.g.
[TABLE]
Combining these considerations with the right side of (A.4) yields that uniformly in , proving the Lemma provided that (A.11) holds.
This statement is a consequence of Lemma A.19 in Neumeyer (2006) , which requires four regularity properties of the process under consideration. The rest of the proof consists in verifying these properties.
- (1)
For all we have to show:
[TABLE]
This is easy to see, since (recall that ) and so the sum is equal to [math] for all larger than some . 2. (2)
For every sequence
[TABLE]
Consider the expectation for some fixed but arbitrary which can be bounded uniformly as follows:
[TABLE]
All three terms inside the square brackets are uniformly of order . This can be shown as follows: An application of the mean value theorem demonstrates that the first term is a null sequence:
[TABLE]
The middle term is bounded by by definition of our semimetric in (A.10), when we consider that and therefore . Consequently it is , as well as the last term by assumption. 3. (3)
Denoting the -bracketing number, as given in Definition A.2, by , the condition we have to check next is, that for every sequence :
[TABLE]
For the construction of an adequate partition of satisfying (A.8), consider the -brackets of , where such that (note that by assumption). The images of these brackets under are simply , due to monotonicity of the integral and they are still -brackets, since reduces -distance. As a consequence we receive -brackets of the whole class .
Additionally choose with , such that the intervals form a partition of the real line (for infinite values we take the intervals to be half open), and such that each interval has probability mass . Then the sets
[TABLE]
form a partition of . Their number is of order , where we might have to slightly shrink such that still and hold. Now we have to show that (A.8) holds, that is in the present case for an arbitrary
[TABLE]
In the subsequent calculation we define the expressions and by taking the respective limits. The left side of the above inequality is bounded by
[TABLE]
Replacing by yields the desired result, without changing the rate of the upper bound of the -bracketing number. Thus the integral in (A.14) converges since . 4. (4)
Finally we have to prove that is totally bounded. By definition is a maximum semimetric defined on the product space . Hence it suffices to show that each of the spaces , is totally bounded, where we define for and
[TABLE]
We start with and demonstrate that for every we can find a finite number of such that for every there exists a such that
[TABLE]
Let and be a closed interval with probability mass larger than . Take an equidistant grid with maximal width of points for across (including the boundary points) and now let, for an arbitrary say be one of the closest points to of this grid. If we choose a boundary point of and the result is immediate. If we get by the mean value theorem:
[TABLE]
For we recall that by our above observations for every the bracketing number of with respect to the norm is finite and thus in particular we have total boundedness.
Having established these regularity properties, by Neumeyer’s Lemma A.19 (2006) equicontinuity follows which completes the proof of Lemma A.3.
∎
Besides Lemma A.3 we require some additional approximation results for a proof of Theorem 3.6.
Proposition A.4**.**
Under the assumptions of Theorem 3.6 we have
[TABLE]
[TABLE]
[TABLE]
Proof of Proposition A.4.
Recalling the definitions of the estimated residuals and the weights , we begin by rewriting the left side of (A.17)
[TABLE]
According to the mean value theorem the absolute of this term is bounded by
[TABLE]
where is some suitable point between and . Since the density is bounded it suffices to show that
[TABLE]
An application of Cauchy-Schwarz yields
[TABLE]
By Assumption 2.2 . Moreover by Corollary 3.7 the gradient converges uniformly to [math]. Thus we get the desired result. The estimate (A.18) follows by similar arguments, while (A.19) is based on two observations. Firstly, since we can rewrite the integral
[TABLE]
Secondly as the errors are centered
[TABLE]
Combining these results yields the representation for the left side of (A.19). By Proposition A.1 this difference is of order .
∎
Equipped with our observations in Lemma A.3 and Proposition A.4 Theorem 3.6 is easily deduced:
Proof of Theorem 3.6 We apply the triangular inequality to arrive at the following decomposition
[TABLE]
Each of the terms on the right side is of order , the first one by Lemma A.3 and the other ones by Proposition A.4. ∎
A.5. Proof of Corollary 3.7
This is a consequence of Theorem 3.6, as we can represent the process as sum of independent stochastic processes and a negligible term:
[TABLE]
The sum on the right side converges to a Gaussian process, by application of a functional CLT for triangular arrays found in Neumeyer (2006).
Appendix B Auxiliary results
B.1. Uniform bounds
We begin stating some frequently used properties of the radial polynomials which are taken from Born and Wolf (1970) and Janssen (2014).
Proposition B.1**.**
- (1)
For all
[TABLE] 2. (2)
For all
[TABLE] 3. (3)
For all the derivative of the corresponding radial polynomial has the following structure:
[TABLE]
Next we provide upper bounds on the -norm of the derivatives of the Chebychev and radial polynomials. The bounds on the radial polynomials follow by the above Proposition and the bounds for the Chebychevs by identities from Mason and Handscomb (2002).
Proposition B.2**.**
Let and , then
[TABLE]
and
[TABLE]
Proof of Proposition B.2.
In order to show the first statement, we apply the identities (B.1) and (3) from Proposition B.1 and use an induction argument. The initial step is given by (B.1) and the induction hypothesis is
[TABLE]
By virtue of (3) we have
[TABLE]
where we have used the induction hypothesis to bound the derivatives of .
The case of the Chebychev polynomials is similar. In order to prove the second identity in Proposition B.2 we cite a few well known facts about Chebychev polynomials from Mason and Handscomb (2002)
- (1)
For all is uniformly bounded by . 2. (2)
Let denote the Chebychev polynomial of the first kind, which satisfies the differential equation
[TABLE]
For all is uniformly bounded by . 3. (3)
For all the representation
[TABLE]
holds, where indicates that we only sum over such terms where is even.
The proof now follows by an induction, analogous to that of the first part. ∎
B.2. Proof of Proposition 2.1
We employ these bounds to sketch a proof of Proposition 2.1. The techniques are borrowed from the theory of Fourier series. It is well known that a continuous function on a compact interval, with absolutely summable Fourier coefficients is identical to its Fourier series . This is most easily proven by observing that and are identical in mean and that by uniform convergence is also continuous. We proceed analogously for the proof of the identities (2.10) - (2.13). The differentiability is an immediate consequence of this argument. To avoid redundancy we confine our investigation to equation (2.10).
Firstly we define the function on the right side of (2.10) by . Obviously
[TABLE]
As is absolutely continuous with respect to the Lebesgue measure the set has Lebesgue measure [math] and thus (2.10) follows if we can establish the continuity of (recall that is continuous by assumption). Continuity of is implied by the uniform convergence of the sequence of continuous functions
[TABLE]
to for . To see this we consider the difference
[TABLE]
where we used (A.2) in the last step. Plugging the identity (2.13) (recall that we already know it in an -sense from equation (2.8)) into the inner products yields
[TABLE]
By the series condition in (2.9), the right and thus the left side converge to [math], which proves continuity of . Consequently, it follows from (B.4), that .
To establish differentiability of and we use their -representations (2.10) and (2.11). Differentiability and summation may be interchanged by uniformity arguments, using the bounds from Proposition B.2. Continuity of the derivatives is then derived as in the above argumentation.
B.3. Proof of Proposition A.1
By definition of in (2.16) and the weights we obtain:
[TABLE]
By Proposition 2.1 the function is twice continuously differentiable. Recalling the definition of , we observe that the real part
[TABLE]
and the imaginary part
[TABLE]
are infinitely often differentiable. By Proposition B.2 it is now easy to see, that all second order derivatives of these functions are uniformly bounded by . We now use a Taylor expansion and obtain for any
[TABLE]
Here denote points dependent on , and , which are located inside because of its convexity. The first two integrals vanish because of the choice of our design points. Moreover and are bounded by by Assumption 1. The second order derivatives of are bounded (because they are continuous) and those of are bounded by , as we have noted above. Thus the term is of order . Treating the integrals in the sum over the imaginary parts in same fashion yields the result. ∎
B.4. Proof of Proposition 3.5
We begin by rewriting the series condition (3.6) as
[TABLE]
where we define for convenience of notation. The reason for this modification is that all conditions are now expressed directly by instead of its Radon transform.
Our proof rests upon an observation found in the monograph van der Vaart and Wellner (1996). If we can find suitable functions with finite -norm, such that the class is included in the union of the -balls with radius , i.e.
[TABLE]
then the -bracketing number of for is upper bounded by . The corresponding brackets are then simply given by for all . We will thus confine ourselves to showing that the covering number of for some arbitrary but fixed is upper bounded by , where .
The rest of the proof consists of the construction of such a class of functions, breaking up in -balls and verifying that their number is bounded in the desired way. We begin by relating closeness of Radon coefficients to closeness in -norm.
Invoking Proposition 2.1, we observe that every function is identical to its -expansion
[TABLE]
Because of (B.9) and we get for each
[TABLE]
We will now investigate the distance between two functions , in which have similar Radon coefficients in the sense that
[TABLE]
for some . For sufficiently large , depending on only, the maximal distance between and can be bounded via
[TABLE]
In the second inequality we used (A.2) and in the last step in order to guarantee the convergence of the series. It is notable that the estimate (B.11) already implies
[TABLE]
for all i.e. substantially different coefficients can only occur for smaller .
Now let us consider those coefficients with . In order to construct the desired functions for a covering of as in (B.10), we decompose the domains of possible Radon coefficients in the following way: For each , the estimate (B.11) implies that
[TABLE]
We can introduce grid points to this cube, such that any two of them have maximal distance . The set of grid points for each cube will be called . It then follows that for each function in we can find a vector of coefficients
[TABLE]
( denotes multiplications with those indices only where is even) such that the corresponding function
[TABLE]
satisfies (B.12) and hence has maximal distance to . Here the coefficients for odd are simply assumed to be [math]. The covering number will hence be bounded by the total number of such coefficients, which can be calculated as follows:
[TABLE]
To achieve the desired rate we repeat our above argumentation for a shrunk version of , say which is still larger than , i.e. with still larger than . For sufficiently small it follows that
[TABLE]
By our auxiliary considerations the bracketing number is thus bounded in the desired way.∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Akritas and van Keilegom (2001) Akritas, M. G. and I. van Keilegom (2001). Non‐parametric estimation of the residual distribution. Scandinavian journal of Statistics 28 , 549–567.
- 2Bertero et al. (2009) Bertero, M., P. Boccacci, G. Desiderà, and G. Vicidomini (2009). Image deblurring with Poisson data: From cells to galaxies. Inverse Problems 25 (12), 123006, 26.
- 3Bickel and Rosenblatt (1973) Bickel, P. J. and M. Rosenblatt (1973). On some global measures of the deviations of density function estimates. Annals of Statistics 1 , 1071–1095.
- 4Bissantz et al. (2018) Bissantz, N., J. Chown, and H. Dette (2018). Regularization parameter selection in inidirect regression by residual based bootstrap. to appear in Statist. Sinica .
- 5Bissantz et al. (2007) Bissantz, N., T. Hohage, A. Munk, and F. Ruymgaart (2007). Convergence rates of general regularization methods for statistical inverse problems. SIAM Journal on Numerical Analysis 45 , 2610–2636.
- 6Bissantz and Holzmann (2013) Bissantz, N. and H. Holzmann (2013). Asymptotics for spectral regularization estimators in statistical inverse problems. Computational Statistics 28 , 435–453.
- 7Blanchard and Mücke (2018) Blanchard, G. and N. Mücke (2018, Aug). Optimal rates for regularization of statistical inverse learning problems. Foundations of Computational Mathematics 18 (4), 971–1013.
- 8Born and Wolf (1970) Born, M. and E. Wolf (1970). Principles of Optics. Oxford: Pergamon Press.
