Random sections of ellipsoids and the power of random information
Aicke Hinrichs, David Krieg, Erich Novak, Joscha Prochno, and Mario, Ullrich

TL;DR
This paper investigates the behavior of the circumradius of intersections of ellipsoids with random subspaces, revealing how it relates to the singular values of the embedding and its implications for random information in $L_2$-approximation.
Contribution
It provides a detailed analysis of the random radius in ellipsoid intersections, connecting it to the worst-case error of algorithms based on random information, and distinguishes behavior based on the decay of singular values.
Findings
Random radius is comparable to the next singular value under certain conditions.
For $\sigma otin ext{ell}_2$, random information is ineffective.
Expected radius decreases at rate $o(1/\sqrt{n})$ when $\sigma ext{ in } ext{ell}_2$.
Abstract
We study the circumradius of the intersection of an -dimensional ellipsoid with semi-axes with random subspaces of codimension . We find that, under certain assumptions on , this random radius is of the same order as the minimal such radius with high probability. In other situations is close to the maximum . The random variable naturally corresponds to the worst-case error of the best algorithm based on random information for -approximation of functions from a compactly embedded Hilbert space with unit ball . In particular, is the th largest singular value of the embedding . In this formulation, one can also consider the case , and we prove that random information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Random sections of ellipsoids and
the power of
random information
Aicke Hinrichs and David Krieg and Erich Novak and Joscha Prochno and Mario Ullrich
Institut für Analysis, Johannes Kepler Universität Linz, Altenbergerstrasse 69, 4040 Linz, Austria
Institut für Mathematik, Friedrich Schiller Universität Jena, Ernst-Abbe-Platz 2, 07743 Jena, Germany
Institut für Mathematik & Wissenschaftliches Rechnen, Karl-Franzens-Universität Graz, Heinrichstrasse 36, 8010 Graz, Austria
Abstract.
We study the circumradius of the intersection of an -dimensional ellipsoid with semi-axes with random subspaces of codimension , where can be much smaller than . We find that, under certain assumptions on , this random radius is of the same order as the minimal such radius with high probability. In other situations is close to the maximum . The random variable naturally corresponds to the worst-case error of the best algorithm based on random information for -approximation of functions from a compactly embedded Hilbert space with unit ball . In particular, is the th largest singular value of the embedding . In this formulation, one can also consider the case and we prove that random information behaves very differently depending on whether or not. For we get and random information is completely useless. For the expected radius tends to zero at least at rate as . In the important case
[TABLE]
where and (which corresponds to various Sobolev embeddings), we prove
[TABLE]
In the proofs we use a comparison result for Gaussian processes à la Gordon, exponential estimates for sums of chi-squared random variables, and estimates for the extreme singular values of (structured) Gaussian random matrices. The upper bound is constructive. It is proven for the worst case error of a least squares estimator.
Key words and phrases:
Random intersection, random information, approximation, high dimensional convexity, Gaussian random matrix, comparison principles for Gaussian processes, least squares
2010 Mathematics Subject Classification:
Primary: 42B35, 52A23, 65Y20 Secondary: 65D15, 60B20
1. Introduction
We are interested in the circumradius of the intersection of a centered ellipsoid in with a random subspace of codimension , where can be much smaller than . While the maximal radius is the length of the largest semi-axis , the minimal radius is the length of the -st largest semi-axis . But how large is the radius of a typical intersection? Is it comparable to the minimal or the maximal radius or does it behave completely different? We prove that the radius of a random intersection satisfies
[TABLE]
with overwhelming probability, where is an absolute constant. For many sequences of semi-axes the right-hand side is of the same order as . This means that a typical intersection has radius comparable to the smallest one. One example are semi-axes of length of polynomial decay .
If the sequence decays too slowly, this is no longer true and we find that a typical intersection often has radius comparable to the largest one. Indeed, if the ellipsoid is ‘fat’ in the sense that the semi-axes satisfy , then we show that
[TABLE]
with overwhelming probability, where is an absolute constant. An example are semi-axes of length of polynomial decay . Altogether, we obtain
[TABLE]
where denotes equivalence up to positive constants not depending on and .
The study of diameters of sections of symmetric convex bodies with a lower-dimensional subspace has been initiated by Giannopoulos and Milman [8, 10] and further advanced in the subsequent works of Litvak and Tomczak-Jaegermann [24], Giannopoulos, Milman, and Tsolomitis [9], or Litvak, Pajor, and Tomczak-Jaegermann [23]. However, as has already been pointed out in [8, 10], one cannot expect these bounds to be sharp for the whole class of symmetric convex bodies as is indicated by ellipsoids with highly incomparable semi-axes for which the diameter of sections of proportional dimension does not concentrate around some value [10, Example 2.2]. Moreover, the focus in these papers was on subspaces of proportional codimension, whereas we are mainly interested in subspaces with small codimension such as or or even .
Our motivation comes from the theory of information-based complexity (IBC). In IBC we often want to approximate the solution of a linear problem based on pieces of information about the unknown problem instance. We refer to [28, 29, 30] for a detailed exposition. It is usually assumed that some kind of oracle is available which grants us this information at our request. We call this oracle times to get well-chosen pieces of information, trying to obtain optimal information about the problem instance. Often, however, this model is too idealistic. There might be no such oracle at our disposal and the information comes in randomly. We simply have to work with the information at hand. This is in fact a standard assumption in learning theory and uncertainty quantification, see [36]. It may also happen that an oracle is available but we simply do not know what to ask in order to obtain optimal information. In such a case, it seems natural to ask random questions. Both scenarios suggest the analysis of random information and the question how it compares to optimal information. For a survey of some classical results as well as new results see [13]. Here we study the case of -approximation of vectors or functions from a Hilbert space.
More precisely, we consider the problem of recovering from the data which is obtained from an information mapping and measure the error in the Euclidean norm. The power of the information mapping is given by its radius, which is the worst case error of the best recovery algorithm based on , that is,
[TABLE]
For problems of this type, it is known that the worst data is the zero data, resulting in
[TABLE]
where is the kernel of , see [5, 28, 40]. Thus, if is a standard Gaussian matrix, we indeed arrive at the same problem as above. The radius of a random intersection is the worst case error of the best algorithm based on Gaussian random information, whereas the radius of the minimal intersection is the worst case error of the best algorithm based on optimal information. So the geometric questions above translate as follows: How good is random information? Is it comparable to the optimal information or is it much worse? The answers are the same. For instance, for polynomial decay , we have
[TABLE]
As a matter of fact, the results for the radius of random information even hold when , where our geometric interpretation fails. Namely, for any , we obtain that and random information is completely useless. For the expected radius of random information tends to zero with the same polynomial rate as the radius of optimal information. The proof of this upper bound is constructive. We present a least squares estimator based on random information that is almost as good as the optimal algorithm based on optimal information.
Remark 1**.**
Using isomorphisms, our results can easily be transferred to any compact embedding of a Hilbert space into a separable -space. That is, we may also consider the problem of approximation an unknown function from the unit ball of in the -norm. In this case, optimal information is given by the generalized Fourier coefficients
[TABLE]
where is the -normalized eigenfunction belonging to the th largest eigenvalue of the operator . The radius of optimal information is the square-root of the st largest eigenvalue. Random information, on the other hand, is given by
[TABLE]
where the are independent standard Gaussian variables. Equivalently, if , we have
[TABLE]
where the are iid Gaussian fields on whose correlation operator is defined by . The results are the same. In particular, random information is (almost) as good as optimal information as long as . An important case, which is often needed in approximation theory and complexity studies, are Sobolev embeddings, i.e., is a Sobolev space of functions that are defined on a bounded domain in . It is well known that then the singular values behave as , where and depend on the smoothness and the dimension and the condition means that the functions in are continuous, see also [4].
Remark 2**.**
The phenomenon, that the results very much depend on whether is square summable or not, is known from a related problem that was studied earlier in several papers. There is the unit ball of a reproducing kernel Hilbert space . That is, consists of functions on a common domain and function evaluation is a continuous functional on for every . Again, the optimal linear information for the -approximation problem is given by the generalized Fourier coefficients and has radius . This information might be hard to get and hence one might allow only function evaluations, i.e., information of the form
[TABLE]
The goal is to relate the power of function evaluations to the power of all continuous linear functionals. Ideally one would like to prove that their power is roughly the same. Unfortunately, in general this is not true. In the case the convergence of optimal algorithms that may only use function values can be arbitrarily slow [14]. The situation is much better if we assume that . It was already shown in [42] and [17] that function values cannot be much worse than general linear information in this case. We refer to [30, Chapter 26] for a presentation of these results. Indeed, based on the proof technique of the present paper, it was recently proven in [16] that the polynomial order of convergence for function values and general linear information is the same for all .
The rest of the paper is organized as follows. In Section 2 we discuss the relation between the geometric problem and the IBC problem in more detail. We give general upper bounds (Theorem 3 and 4) and lower bounds (Theorem 5) for the radius of random information in terms of the sequence which hold with high probability. We derive the -dichotomy discussed above (Corollary 6) and apply the general theorems to sequences of polynomial decay (Corollary 7) and exponential decay (Corollary 9). The proofs are contained in Section 3. We add a final section about alternative approaches. We show an upper bound via the lower -estimate and an elementary lower bound. These bounds are slightly weaker, but give a better insight into the geometric aspect of the problem.
2. Problem and results
We consider the ellipsoid
[TABLE]
with semi-axes of lengths .111For convenience, we use the convention that for all and . This is the unit ball of a Hilbert space, which we denote by . We study the problem of recovering an unknown vector from pieces of information, where we want to guarantee a small error in . The information about is given by the outcome of linear functionals.222Note that we also consider unbounded functionals, meaning that The mapping is called the information mapping. A recovery algorithm is a mapping of the form , where maps back to . The worst case error of the algorithm is given by
[TABLE]
The quality of the information mapping is measured by its radius, which is the worst case error of the best recovery algorithm based on the information , i.e.,
[TABLE]
Note that this is a linear problem over Hilbert spaces as described in [28, Section 4.2.3]. In particular, we have the relation
[TABLE]
with equality for all bounded information mappings . We refer to [5, 28, 40]. It is easy to see that optimal information is given by the mapping
[TABLE]
which satisfies
[TABLE]
We want to compare this to the radius of random information, which is given by a random matrix with independent standard Gaussian entries. That is, we study the random variable
[TABLE]
Clearly, we always have . There are two alternative interpretations of the quantity if the sequence is finite in the sense that for all .
Variant 1. The quantity is the circumradius of the -dimensional ellipsoid that is obtained by slicing the -dimensional ellipsoid with a subspace that is uniformly distributed on the Grassmannian manifold of -codimensional subspaces in (equipped with the Haar probability measure). That is,
[TABLE]
This easily follows from the fact that the kernel of the matrix (when restricted to ) is uniformly distributed on the Grassmannian.
Variant 2. Since the radius of information is invariant under a (component-wise) scaling of the information mapping, we also have
[TABLE]
where is obtained from by erasing all but the first columns and then normalizing the rows in . That is, in the finite-dimensional case, we may just as well study the quality of the information given by coordinates in random directions that are independent and uniformly distributed on the sphere .
We want to give upper and lower bounds for which hold with high probability. Clearly, upper bounds are stronger if they are proven for the error of a concrete algorithm . Here, we consider a least squares estimator. We define as the restriction of to , where is of order and will be specified later. Note that we identify with the space of all such that for all . We then take where is the Moore-Penrose inverse of . This algorithm satisfies, almost surely, that for all . Moreover, one may write
[TABLE]
Let us now present the results. We note that it is not an essential assumption that the vector of semi-axes is non-increasing or even that the semi-axes are aligned with the standard basis of the Euclidean space. It simply eases the notation.
Theorem 3**.**
Let be non-increasing and let . Then the following estimate holds with probability at least ,
[TABLE]
This estimate turns out to be useful for sequences of polynomial decay. For sequences of exponential decay, we add a second upper bound. It is better suited for such sequences since the starting index of the sum in the upper bound is replaced by .
Theorem 4**.**
Let be non-increasing. Then, for all and we have
[TABLE]
On the other hand, we obtain the following lower bound for . This lower bound is even satisfied for the smaller quantity
[TABLE]
which corresponds to the difficulty of the easier problem of recovering just the first coordinate of from Gaussian information.
Theorem 5**.**
Let be non-increasing, and with
[TABLE]
Then it holds with probability at least that
[TABLE]
As a consequence of these theorems, we obtain that random information is useful if and only if is square summable.
Corollary 6**.**
If , then holds almost surely for all . On the other hand, if , then
[TABLE]
Before we present the proofs of our main results, let us provide some of the results on the expected radius that follow from our main results for special sequences. For sequences and we write if there is a constant such that for all . We write in the case that both and . We start with the case of polynomial decay.
Corollary 7**.**
Let be non-increasing with for some and (where for ). Then
[TABLE]
Similar results can be derived for the finite-dimensional case (i.e., for ) under the condition that is large enough in comparison to . Details can be found in the thesis [15, Corollaries 4.31–4.33]. This means that random information is just as good as optimal information if the singular values decay with a polynomial rate greater than . The size of a typical intersection ellipsoid is comparable to the size of the smallest intersection. On the other hand, if the singular values decay too slowly, random information is useless. A typical intersection ellipsoid is almost as large as the largest. There is also an intermediate case where random information is worse than optimal information, but only slightly.
Remark 8**.**
The case with can be extended to for any slowly varying function . Also in this case, we get .
We also discuss sequences of exponential decay. We have seen that holds for sequences with sufficiently fast polynomial decay. It remains open whether the same holds for sequences of exponential decay. With Theorem 4 we obtain the following.
Corollary 9**.**
Assume that for some . Then
[TABLE]
Despite the gap, this result is stronger than the result for polynomial decay if considered from the complexity point of view. Corollary 7 states that there is a constant such that pieces of random information are at least as good as pieces of optimal information. Corollary 9 states that there is a constant such that pieces of random information are at least as good as pieces of optimal information.
3. The Proofs
In the proofs we will use the following tools:
exponential estimates for sums of chi-squared random variables,
Gordon’s min-max theorem for Gaussian processes,
estimates for the extreme singular values of (structured) Gaussian matrices.
Before we enter the proofs, we recall and extend some of our notation. Let be a non-increasing sequence of non-negative numbers. We consider the Hilbert space
[TABLE]
using the convention that for and , with inner product
[TABLE]
The unit ball of is denoted by . The matrix for has independent standard Gaussian entries. We want to study the distribution of the random variable
[TABLE]
Of course, the equation requires that the series converges. For index sets and , we consider the (structured) Gaussian -matrices
[TABLE]
Note that and , where denotes the set of integers from 1 to . We consider
[TABLE]
as a closed subspace of the Hilbert space and denote its unit ball by . The projection of onto is denoted by .
A crucial role in our proofs is played by estimates for the extreme singular values of random matrices. We recall some basic facts about singular values. Let be a real -matrix, where we allow that or provided that describes a compact operator from to (with Euclidean norm). For every , the th singular value of this matrix can be defined as the square-root of the th largest eigenvalue of the symmetric matrix , which describes a positive operator on . Note that if we have . Our interest lies in the extreme singular values of . The largest singular value of is given by
[TABLE]
This number is also called the spectral norm of . The smallest singular value is given by
[TABLE]
Clearly, we have whenever . If , it also makes sense to talk about the th singular value of . This number equals the radius of the largest Euclidean ball that is contained in the image of the unit ball of under , that is
[TABLE]
where denotes the unit ball in -dimensional Euclidean space. These extreme singular values are also defined for noncompact operators , where is restricted to its domain if necessary. We now turn to the proofs of our results.
3.1. The Upper Bound
We start with an almost sure upper bound for the worst case error of the least squares algorithm from (1). The upper bound is given in terms of the extreme singular values of the corresponding (structured) Gaussian matrices. The spectral statistics of random matrices, in particular the behavior of the least and largest singular value, attracted considerable attention over the years and we refer the reader to, e.g., [1, 3, 6, 22, 33, 34, 38, 41] and the references cited therein.
Proposition 10**.**
Let be non-increasing and let . If has full rank, then
[TABLE]
Proof.
We first note that if has full rank. Let with . We recall that . This yields
[TABLE]
The norm of is the inverse of the th largest (and therefore the smallest) singular value of the matrix . The norm of is the largest singular value of the matrix
[TABLE]
To see this, note that on , where the mapping with is an isomorphism. This yields the stated inequality. ∎
The task now is to bound the -th singular value of the Gaussian matrix from below and the largest singular value of the structured Gaussian matrix from above. We start with the largest singular value of the latter. Let us remark that the question for the order of the expected value of the largest singular value of a structured Gaussian random matrix has recently been settled by Latała, Van Handel, and Youssef [19] (see also [3, 7, 12, 18, 41] for earlier work in this direction). The result we shall use here is due to Bandeira and Van Handel [3].
Lemma 11**.**
Let be non-increasing. For every and , we have
[TABLE]
Proof.
Without loss of generality, we may assume that . Let us first consider the finite matrix
[TABLE]
and set
[TABLE]
where and denote their infinite dimensional variants. It is proven in [3, Corollary 3.11] that, for every (and ), we have
[TABLE]
By setting , it follows that
[TABLE]
Turning to the infinite dimensional case, we note that we have if and only if there is some such that . This yields
[TABLE]
since is increasing in and . ∎
Together with Proposition 10 this yields that the estimate
[TABLE]
holds with probability at least for all and . It remains to bound the -th singular value of the Gaussian matrix from below. It is known from [35, Theorem 1.1] that this number typically is of order for all and . To exploit our upper bound to full extend, the number may be chosen such that the right-hand side of (2) becomes minimal. We realize that the term increases with , whereas all remaining terms decrease with . However, the inverse singular number achieves its minimal order already for with some . If does not decay extremely fast, this does not lead to a loss regarding the other terms of (2). For instance, we may choose and use the following special case of [6, Theorem II.13].
Lemma 12**.**
Let and . Then
[TABLE]
Proof.
It is shown in [6, Theorem II.13] that, for all and , we have
[TABLE]
The statement follows by putting and . ∎
If decays very fast, might not be the best choice. The term in estimate (2) may be much smaller for than for . It is better to choose . In this case, the inverse singular number is of order . We state a result of [37, Theorem 1.2].
Lemma 13**.**
Let and . Then
[TABLE]
This leads to the proof of Theorem 3 and 4 as presented in the introduction.
Proof of Theorem 3 and 4.
To prove the first statement, let . We combine Lemma 12 and Lemma 11 for with Proposition 10 and obtain that
[TABLE]
with probability at least . The statement follows if we take into account that
[TABLE]
To prove the second statement, we set . We combine Lemma 13 and Lemma 11 with Proposition 10 and obtain that
[TABLE]
with probability at least . The rough estimates and and yield the statement. ∎
3.2. The Lower Bound
We want to give lower bounds on the radius of information
[TABLE]
which corresponds to the difficulty of recovering an unknown element from the information in . In fact, our lower bounds already hold for the smaller quantity
[TABLE]
which corresponds to the difficulty of recovering just the th coordinate of . Again, we start with an almost sure estimate.
Proposition 14**.**
Let be non-increasing. For all with we have almost surely
[TABLE]
Proof.
We may assume that the operator is onto and that is nonzero since these events occur with probability 1. Observe that
[TABLE]
where is the unit ball of . In particular, this implies
[TABLE]
Let be the -th standard unit vector in . Then we have
[TABLE]
Since the image of under contains a Euclidean ball of radius , we find an element of such that
[TABLE]
For , we obtain and
[TABLE]
Then the vector satisfies and as well as
[TABLE]
The statement is obtained by
[TABLE]
∎
It remains to bound the th singular value of and the norm of the Gaussian vector with high probability. For both estimates, we use the following concentration result for chi-square random variables going back to Laurent and Massart [20, Lemma 1]. Alternatively, one could use the concentration of Gaussian random vectors in Banach spaces (see, e.g., [21, Proposition 2.18]).
Lemma 15**.**
For , let be independent centered Gaussian variables with variance . Then, for any , we have
[TABLE]
Proof.
The lemma [20, Lemma 1] states that, for all , we have
[TABLE]
The formulation of Lemma 15 follows if we put
[TABLE]
The desired probability estimate then follows by using . ∎
In particular, the norm of the Gaussian vector concentrates around . To bound the th singular value of we shall use Gordon’s min-max theorem. Let us state Gordon’s theorem [11, Lemma 3.1] in a form which can be found in [39].
Lemma 16** (Gordon’s min-max theorem).**
Let and let , be compact sets. Assume that is a continuous mapping. Let , , and be independent random objects with independent standard Gaussian entries. Moreover, define
[TABLE]
Then, for all , we have
[TABLE]
This yields the following lower bound on the smallest singular value of structured Gaussian matrices. Note that this is a generalization of Lemma 12.
Lemma 17**.**
Let be a random matrix whose entries are centered Gaussian variables with variance for all and . Then, for all , we have
[TABLE]
Proof.
Note that the statement is trivial if . We may assume that the are positive since an additional row of zeros does neither change nor the norms of the vector . We have the identity where is a random matrix with independent standard Gaussian entries and is the diagonal matrix
[TABLE]
We want to apply Gordon’s theorem for the matrix and , where is the sphere in and is the image of the sphere in under . Then we have
[TABLE]
On the other hand, if and are standard Gaussian vectors, the choice of yields
[TABLE]
Theorem 16 implies for all that
[TABLE]
To obtain the statement of our lemma, we set . By Lemma 15, we have
[TABLE]
and
[TABLE]
Now the statement is obtained from a union bound. ∎
We need the statement of Lemma 17 for matrices with infinitely many rows, which is obtained from a simple limit argument.
Lemma 18**.**
Formula (3) also holds for provided that .
Proof.
Again, we may assume that is strictly positive. For let be the sub-matrix consisting of the first rows of and let be the sub-vector consisting of the first entries of . We use the notation
[TABLE]
where and correspond to the case . For any with we can choose such that and . Note that we have and thus
[TABLE]
Letting tend to zero yields the statement. ∎
We arrive at our main lower bound.
Lemma 19**.**
Let be non-increasing and let be such that . Define
[TABLE]
Then, for all , we have
[TABLE]
Proof.
First note that, in the setting of Proposition 14, the matrix and the vector are independent. Lemma 15 and Lemma 18 yield
[TABLE]
with probability at least . Note that we have
[TABLE]
since erasing rows can only shrink the smallest singular value. In this case, we have
[TABLE]
Now the statement is obtained from Proposition 14. ∎
This also proves Theorem 5 as stated in the previous section.
Proof of Theorem 5.
We simply apply Lemma 19 and choose to obtain the desired lower bound for . Since the lower bound is independent of , we actually get the same lower bound for , where is obtained from by replacing the first coordinates with . To see that the lower bound also holds for (as opposed to ), we only need to realize that , where clearly has the same distribution as . ∎
3.3. Corollaries
In order to optimize the lower bound of Theorem 5, we may choose such that the right-hand side of our lower bound becomes maximal. If the Euclidean norm of is large, we simply choose . Taking into account that is decreasing in , we immediately arrive at the following result.
Lemma 20**.**
Let be a nonincreasing sequence of nonnegative numbers and let
[TABLE]
Then for all with probability at least .
This leads to a proof of Corollary 6 which states that random information is useful if and only if .
Proof of Corollary 6.
We first consider the case that . Since , Theorem 3 yields
[TABLE]
The statement is now implied by the fact that .
For the case that , let . For let be the sequence obtained from by replacing the th element with zero for all . For any , we can choose such that
[TABLE]
since . The first part of this corollary yields that
[TABLE]
Since this holds for any , we get that the event happens with probability 1 for any . This yields the statement since the event is the intersection of countably many such events. ∎
We now apply our general estimates for to specific sequences to prove the statements of Corollaries 7 and 9.
Proof of Corollary 7.
Part 1. The upper bound in the first equivalence is trivial since almost surely. The lower bound follows immediately from Corollary 6.
Part 2. To prove the second equivalence of Corollary 7, it is enough to consider the sequence
[TABLE]
with . Note that we have for any that
[TABLE]
where the implied constants depend only on . Now it follows from Theorem 3 and from Theorem 5 for with some that
[TABLE]
with probability at least , where the implied constants depend only on . The statement for the expected value follows from .
Part 3. In the third equivalence of Corollary 7, the lower bound is trivial and even holds almost surely. To prove the upper bound, it is enough to consider the sequence
[TABLE]
where and . Theorem 3 yields for large that
[TABLE]
with probability at least and implied constants only depending on and . This yields the statement since almost surely. ∎
Proof of Corollary 9.
The lower bound follows from the trivial estimate . To prove the upper bound, we consider the case for all . The general case follows from the monotonicity and homogenity of with respect to . We use Theorem 4. We choose such that . Note that there is some such that
[TABLE]
for all . Theorem 4 yields for all that
[TABLE]
This yields that
[TABLE]
as it was to be proven. ∎
4. Alternative approaches
In this section we present alternative ways to estimate the radius of random information from above and below. We choose to do this because these approaches give a better insight into the geometric aspect of the problem. The results, however, are slightly weaker than those obtained in Section 3. The upper bound is weaker since it is not constructive and the lower bound is weaker since it requires a little more than . For these geometric approaches, we restrict to the case of finite sequences with for all . We write when we consider the ellipsoid as a subset of .
4.1. Upper bound via the lower -estimate
We present an alternative proof of our main upper bound. As already explained in the introduction, the radius of information can also be expressed as the radius of the ellipsoid that is obtained by slicing the -dimensional ellipsoid with a random subspace of codimension . To estimate the radius from above, we use a result of Gordon from [11] on estimates of the Euclidean norm against a norm induced by a symmetric convex body on large subsets of Grassmannians. Note that the first result in this direction had been established by V.D. Milman in [26].
We start with some notation and background information. Let be an origin symmetric convex body, i.e., a compact and convex set with non-empty interior such that implies . We define the quantity
[TABLE]
where is the unit Euclidean sphere in , integration is with respect to the normalized surface measure on , and is the support function of given by
[TABLE]
Obviously, the support function is just the dual norm to the norm induced by , i.e., if is the so-called polar body of , then . Since for the support function quantifies the distance from the origin to the supporting hyperplane orthogonal to , the quantity is simply (half) the mean width of the body .
Remark 21**.**
In the theory of asymptotic geometric analysis, the quantities together with
[TABLE]
play an important rôle since the work of V.D. Milman on a quantitative version of Dvoretzky’s theorem on almost Euclidean subspaces of a Banach space. Using Jensen’s inequality together with polar integration and Urysohn’s inequality, it is not hard to see that
[TABLE]
where is the volume radius of (here stands for the -dimensional Lebesgue measure). For isotropic convex bodies in (i.e., convex bodies of volume with centroid at the origin satisfying the isotropic condition – we refer to [2] for details), this immediately yields
[TABLE]
for some absolute constant . The question about upper bounds for with in isotropic position has been essentially settled by E. Milman in [25, Theorem 1.1] who proved that
[TABLE]
with absolute constant . In fact, the -term is optimal and also the logarithmic part (up to the power). The optimality of the -term is intimately related to the famous hyperplane conjecture. For a detailed exposition, we refer the reader to [2] and the references cited therein.
We continue with the so-called lower -estimate. The first estimate of this type was proved by V.D. Milman in [26], see also [27]. We use the asymptotically optimal version obtained by Pajor and Tomczak-Jaegermann in [32] with improved constants from [11]. For the precise formulation, we refer to [2, Theorem 7.3.5].
Proposition 22**.**
For , define
[TABLE]
Let be the unit ball of a norm on . For any and there exists a subset in the Grassmannian of -codimensional linear subspaces of with Haar measure at least
[TABLE]
such that for any and all we have
[TABLE]
We should observe here that the distribution of the kernels of the Gaussian matrices is the uniform distribution, i.e., the distribution of the Haar measure, on the Grassmann manifold . This follows immediately from the rotational invariance of both measures on . Hence, the probability estimate in the theorem is exactly with respect to the probability on the kernels we use elsewhere.
We want to apply the lower -estimate to obtain an upper bound on the radius of information. For this note that the ellipsoid satisfies
[TABLE]
This implies , and therefore,
[TABLE]
A direct application of Proposition 22 with leads to the upper bound
[TABLE]
with an absolute constant . This estimate is not very good if the semi-axes decay quickly. A better estimate can be obtained by switching from to its intersection with a Euclidean ball of small radius, a renorming argument going back to Pajor and Tomzcak-Jaegermann [31].
Proposition 23**.**
There exists an absolute constant such that for any non-increasing finite sequence and all , we have
[TABLE]
with probability at least .
Proof.
Let be the intersection of the ellipsoid with a centered Euclidean ball of radius . For all , and , Cauchy-Schwarz inequality yields
[TABLE]
and thus the same upper bound holds for . We obtain that
[TABLE]
Proposition 22 tells us that for any there exists a subset of with measure at least such that
[TABLE]
for any and an absolute constant . Choosing such that and yields
[TABLE]
This clearly implies that also
[TABLE]
For simplicity, we choose and obtain the stated inequality. ∎
4.2. Elementary lower bound
In this section we prove the following lower bound.
Proposition 24**.**
For any and there is a constant such that the following holds. If for some and then holds for all with probability at least .
To obtain this lower bound, we first consider the problem of just recovering the first coordinate of in the unit ball of . The corresponding radius of information is given by
[TABLE]
Lemma 25**.**
For , we have
[TABLE]
In particular, for any , we have
[TABLE]
with probability at least .
Proof.
Let . To prove (4), we observe that we want to compute the expectation of the random variable
[TABLE]
where is uniformly distributed on and is fixed. Involving an orthogonal transformation of the coordinate system, we may also fix the subspace
[TABLE]
and assume that is uniformly distributed on the sphere. This does not change the distribution of . By the Cauchy-Schwarz inequality, the maximum is attained for
[TABLE]
where denotes the orthogonal projection on . We obtain
[TABLE]
We observe that for all , since these terms are equal and sum up to 1. This shows (4). Estimate (5) is a direct consequence of (4) taking into account that . ∎
Proof of Proposition 24.
If we choose satisfying (5), by definition of and compactness of the unit sphere of , we find with and satisfying
[TABLE]
Thus is already rather close to which implies that the other coordinates can not be too big. Indeed we find
[TABLE]
Rescaling yields
[TABLE]
and finishes the proof. ∎
Acknowledgements**.**
We thank several colleagues for valuable remarks and comments. Part of the work was done during a special semester at the Erwin Schrödinger International Institute for Mathematics and Physics (ESI) in Vienna. We thank ESI for the hospitality. A. Hinrichs and D. Krieg are supported by the Austrian Science Fund (FWF) Project F5513-N26, which is part of the Special Research Program “Quasi-Monte Carlo Methods: Theory and Applications”. J. Prochno is supported by the Austrian Science Fund (FWF) Project P32405 “Asymptotic Geometric Analysis and Applications” as well as a visiting professorship from Ruhr University Bochum and its Research School PLUS.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Adamczak, O. Guédon, A. Litvak, A. Pajor, and N. Tomczak-Jaegermann. Smallest singular value of random matrices with independent columns. C. R. Math. Acad. Sci. Paris , 346(15-16):853–856, 2008.
- 2[2] S. Artstein-Avidan, A. Giannopoulos, and V. D. Milman. Asymptotic geometric analysis. Part I , volume 202 of Mathematical Surveys and Monographs . American Mathematical Society, Providence, RI, 2015.
- 3[3] A. S. Bandeira and R. Van Handel. Sharp nonasymptotic bounds on the norm of random matrices with independent entries. Ann. Probab. , 44(4):2479–2506, 2016.
- 4[4] B. Carl and I. Stephani. Entropy, compactness and the approximation of operators , volume 98 of Cambridge Tracts in Mathematics . Cambridge University Press, Cambridge, 1990.
- 5[5] J. Creutzig and P. Wojtaszczyk. Linear vs. nonlinear algorithms for linear problems. J. Complexity , 20(6):807–820, 2004.
- 6[6] K. R. Davidson and S. Szarek. Local operator theory, random matrices and Banach spaces. In Handbook of the geometry of Banach spaces, Vol. I , pages 317–366. North-Holland, Amsterdam, 2001.
- 7[7] A. Edelman. Eigenvalues and condition numbers of random matrices. SIAM J. Matrix Anal. Appl. , 9(4):543–560, 1988.
- 8[8] A. Giannopoulos and V. D. Milman. Mean width and diameter of proportional sections of a symmetric convex body. J. Reine Angew. Math. , 497:113–139, 1998.
