Entropy numbers of finite dimensional mixed-norm balls and function   space embeddings with small mixed smoothness

Sebastian Mayer; Tino Ullrich

arXiv:1904.04619·math.FA·March 2, 2020

Entropy numbers of finite dimensional mixed-norm balls and function space embeddings with small mixed smoothness

Sebastian Mayer, Tino Ullrich

PDF

TL;DR

This paper derives precise bounds for entropy numbers of finite-dimensional mixed-norm balls and applies these results to determine optimal asymptotic rates for embeddings of certain function spaces with small mixed smoothness, resolving an open problem.

Contribution

It provides the first matching bounds for entropy numbers of mixed-norm embeddings and establishes optimal dimension-free rates for Besov and Triebel-Lizorkin space embeddings with small mixed smoothness.

Findings

01

Matching bounds for entropy numbers of mixed-norm embeddings

02

Optimal asymptotic rates for function space embeddings

03

Resolution of an open problem in the literature

Abstract

We study the embedding $id : ℓ_{p}^{b} (ℓ_{q}^{d}) \to ℓ_{r}^{b} (ℓ_{u}^{d})$ and prove matching bounds for the entropy numbers $e_{k} (id)$ provided that $0 < p < r \leq \infty$ and $0 < q \leq u \leq \infty$ . Based on this finding, we establish optimal dimension-free asymptotic rates for the entropy numbers of embeddings of Besov and Triebel-Lizorkin spaces of small dominating mixed smoothness which settles an open question in the literature. Both results rely on a novel covering construction recently found by Edmunds and Netrusov.

Equations305

e_{k}(K,Y)\mathrel{\mathop{\mathchar 58\relax}}=\inf\Big{\{}\varepsilon>0~{}\mathrel{\mathop{\mathchar 58\relax}}~{}\exists y_{1},...,y_{2^{k-1}}\text{ such that }K\subset\bigcup\limits_{\ell=1}^{2^{k-1}}y_{\ell}+\varepsilon B_{Y}\Big{\}}\quad,\quad k\in\mathds{N}\,.

e_{k}(K,Y)\mathrel{\mathop{\mathchar 58\relax}}=\inf\Big{\{}\varepsilon>0~{}\mathrel{\mathop{\mathchar 58\relax}}~{}\exists y_{1},...,y_{2^{k-1}}\text{ such that }K\subset\bigcup\limits_{\ell=1}^{2^{k-1}}y_{\ell}+\varepsilon B_{Y}\Big{\}}\quad,\quad k\in\mathds{N}\,.

e_{k} (T \mathchar 58 X \to Y) \mathchar 58 = e_{k} (T (B_{X}), Y) .

e_{k} (T \mathchar 58 X \to Y) \mathchar 58 = e_{k} (T (B_{X}), Y) .

Id \mathchar 58 S_{p_{0}, q_{0}}^{r_{0}} A (Ω) \to S_{p_{1}, q_{1}}^{r_{1}} A^{†} (Ω), A, A^{†} \in {B, F},

Id \mathchar 58 S_{p_{0}, q_{0}}^{r_{0}} A (Ω) \to S_{p_{1}, q_{1}}^{r_{1}} A^{†} (Ω), A, A^{†} \in {B, F},

Id \mathchar 58 S_{p_{0}, q_{0}}^{r} A (Ω) \to L_{p_{1}} (Ω), A \in {B, F},

Id \mathchar 58 S_{p_{0}, q_{0}}^{r} A (Ω) \to L_{p_{1}} (Ω), A \in {B, F},

e_{m} (Id) ≃_{n} m^{- (r_{0} - r_{1})} (lo g m)^{(n - 1) η},

e_{m} (Id) ≃_{n} m^{- (r_{0} - r_{1})} (lo g m)^{(n - 1) η},

1/ p_{0} - 1/ p_{1} < r_{0} - r_{1} \leq 1/ q_{0} - 1/ q_{1} .

1/ p_{0} - 1/ p_{1} < r_{0} - r_{1} \leq 1/ q_{0} - 1/ q_{1} .

e_{m} (Id) ≃_{n} m^{- (r_{0} - r_{1})}, m \in \mathds N,

e_{m} (Id) ≃_{n} m^{- (r_{0} - r_{1})}, m \in \mathds N,

id \mathchar 58 ℓ_{p}^{b} (ℓ_{q}^{d}) \to ℓ_{r}^{b} (ℓ_{u}^{d}),

id \mathchar 58 ℓ_{p}^{b} (ℓ_{q}^{d}) \to ℓ_{r}^{b} (ℓ_{u}^{d}),

1/ p - 1/ r > 1/ q - 1/ u \geq 0 .

1/ p - 1/ r > 1/ q - 1/ u \geq 0 .

e_{k} (id) ≃ ⎩ ⎨ ⎧ 1 (\frac{l o g ( e d / k )}{k})^{1/ q - 1/ u} (\frac{d}{k})^{1/ p - 1/ r} d^{- (1/ q - 1/ u)} b^{- (1/ p - 1/ r)} d^{- (1/ q - 1/ u)} 2^{- \frac{k - 1}{b d}} \mathchar 58 1 \leq k \leq lo g (b d), \mathchar 58 lo g (b d) \leq k \leq d, \mathchar 58 d \leq k \leq b d, \mathchar 58 k \geq b d .

e_{k} (id) ≃ ⎩ ⎨ ⎧ 1 (\frac{l o g ( e d / k )}{k})^{1/ q - 1/ u} (\frac{d}{k})^{1/ p - 1/ r} d^{- (1/ q - 1/ u)} b^{- (1/ p - 1/ r)} d^{- (1/ q - 1/ u)} 2^{- \frac{k - 1}{b d}} \mathchar 58 1 \leq k \leq lo g (b d), \mathchar 58 lo g (b d) \leq k \leq d, \mathchar 58 d \leq k \leq b d, \mathchar 58 k \geq b d .

e_{k}(\mathrm{id}\mathrel{\mathop{\mathchar 58\relax}}\ell_{p}^{b}\to\ell_{r}^{b})\simeq\left\{\begin{array}[]{rcl}1&\mathrel{\mathop{\mathchar 58\relax}}&1\leq k\leq\log(b),\\ \Big{(}\frac{\log(eb/k)}{k}\Big{)}^{1/p-1/r}&\mathrel{\mathop{\mathchar 58\relax}}&\log(b)\leq k\leq b,\\ 2^{-\frac{k-1}{b}}b^{-(1/p-1/r)}&\mathrel{\mathop{\mathchar 58\relax}}&k\geq b\,,\end{array}\right.

e_{k}(\mathrm{id}\mathrel{\mathop{\mathchar 58\relax}}\ell_{p}^{b}\to\ell_{r}^{b})\simeq\left\{\begin{array}[]{rcl}1&\mathrel{\mathop{\mathchar 58\relax}}&1\leq k\leq\log(b),\\ \Big{(}\frac{\log(eb/k)}{k}\Big{)}^{1/p-1/r}&\mathrel{\mathop{\mathchar 58\relax}}&\log(b)\leq k\leq b,\\ 2^{-\frac{k-1}{b}}b^{-(1/p-1/r)}&\mathrel{\mathop{\mathchar 58\relax}}&k\geq b\,,\end{array}\right.

∥ x + y ∥_{X} \leq α_{X} (∥ x ∥_{X} + ∥ y ∥_{X}), for all x, y \in X .

∥ x + y ∥_{X} \leq α_{X} (∥ x ∥_{X} + ∥ y ∥_{X}), for all x, y \in X .

∥ x + y ∥_{X}^{p} \leq ∥ x ∥_{X}^{p} + ∥ y ∥_{X}^{p}, for all x, y \in X .

∥ x + y ∥_{X}^{p} \leq ∥ x ∥_{X}^{p} + ∥ y ∥_{X}^{p}, for all x, y \in X .

\|x\|_{p,q}\mathrel{\mathop{\mathchar 58\relax}}=\left(\sum_{i=1}^{b}\Big{(}\sum_{j=1}^{d}|x_{ij}|^{q}\Big{)}^{p/q}\right)^{1/p},\qquad 0<p<\infty,\;0<q<\infty\,,

\|x\|_{p,q}\mathrel{\mathop{\mathchar 58\relax}}=\left(\sum_{i=1}^{b}\Big{(}\sum_{j=1}^{d}|x_{ij}|^{q}\Big{)}^{p/q}\right)^{1/p},\qquad 0<p<\infty,\;0<q<\infty\,,

K\subset\bigcup_{i=1}^{n}\big{(}x_{i}+\varepsilon B_{Y}\big{)}\,.

K\subset\bigcup_{i=1}^{n}\big{(}x_{i}+\varepsilon B_{Y}\big{)}\,.

M_{2 ε} (K, Y) \leq N_{ε} (K, Y) \leq M_{ε} (K, Y) .

M_{2 ε} (K, Y) \leq N_{ε} (K, Y) \leq M_{ε} (K, Y) .

H_{ε} (K, Y) = lo g_{2} N_{ε} (K, Y),

H_{ε} (K, Y) = lo g_{2} N_{ε} (K, Y),

e_{k} (K, Y) \mathchar 58 = in f {ε > 0 \mathchar 58 H_{ε} (K, Y) \leq k - 1} .

e_{k} (K, Y) \mathchar 58 = in f {ε > 0 \mathchar 58 H_{ε} (K, Y) \leq k - 1} .

e_{k} (T \mathchar 58 X \to Y) = e_{k} (T (B_{X}), Y), k \in \mathds N .

e_{k} (T \mathchar 58 X \to Y) = e_{k} (T (B_{X}), Y), k \in \mathds N .

∥ T ∥ = e_{1} (T) \geq e_{2} (T) \geq e_{3} (T) \geq \dots \geq e_{k} (T) \geq 0 .

∥ T ∥ = e_{1} (T) \geq e_{2} (T) \geq e_{3} (T) \geq \dots \geq e_{k} (T) \geq 0 .

e_{k_{1} + k_{2} - 1} (T_{1} + T_{2})^{ϑ} \leq e_{k_{1}} (T_{1})^{ϑ} + e_{k_{2}} (T_{2})^{ϑ} .

e_{k_{1} + k_{2} - 1} (T_{1} + T_{2})^{ϑ} \leq e_{k_{1}} (T_{1})^{ϑ} + e_{k_{2}} (T_{2})^{ϑ} .

e_{k_{1} + k_{2} - 1} (R \circ S) ≲ e_{k} (R) e_{k_{2}} (S) .

e_{k_{1} + k_{2} - 1} (R \circ S) ≲ e_{k} (R) e_{k_{2}} (S) .

e_{k} (R \circ S) \leq e_{k} (R) ∥ S ∥ .

e_{k} (R \circ S) \leq e_{k} (R) ∥ S ∥ .

e_{k}(\mathrm{id}\mathrel{\mathop{\mathchar 58\relax}}\ell_{p}^{b}\to\ell_{q}^{b})\simeq\left\{\begin{array}[]{rcl}1&\mathrel{\mathop{\mathchar 58\relax}}&1\leq k\leq\log(b),\\ \Big{(}\frac{\log(1+b/k)}{k}\Big{)}^{1/p-1/q}&\mathrel{\mathop{\mathchar 58\relax}}&\log(b)\leq k\leq b,\\ 2^{-k/b}b^{1/q-1/p}&\mathrel{\mathop{\mathchar 58\relax}}&k\geq b.\end{array}\right.

e_{k}(\mathrm{id}\mathrel{\mathop{\mathchar 58\relax}}\ell_{p}^{b}\to\ell_{q}^{b})\simeq\left\{\begin{array}[]{rcl}1&\mathrel{\mathop{\mathchar 58\relax}}&1\leq k\leq\log(b),\\ \Big{(}\frac{\log(1+b/k)}{k}\Big{)}^{1/p-1/q}&\mathrel{\mathop{\mathchar 58\relax}}&\log(b)\leq k\leq b,\\ 2^{-k/b}b^{1/q-1/p}&\mathrel{\mathop{\mathchar 58\relax}}&k\geq b.\end{array}\right.

D (m, k) = ℓ \in \mathds N, m \leq ℓ \leq k max (ℓ / k)^{1/ p - 1/ r} e_{ℓ} (id \mathchar 58 X \to Y),

D (m, k) = ℓ \in \mathds N, m \leq ℓ \leq k max (ℓ / k)^{1/ p - 1/ r} e_{ℓ} (id \mathchar 58 X \to Y),

A (k, b) = max {∥ id \mathchar 58 X \to Y ∥ (\frac{lo g ( e b / k )}{k})^{1/ p - 1/ r}, D (1, k)} .

A (k, b) = max {∥ id \mathchar 58 X \to Y ∥ (\frac{lo g ( e b / k )}{k})^{1/ p - 1/ r}, D (1, k)} .

e_{k} (id \mathchar 58 ℓ_{p}^{b} (X) \to ℓ_{r}^{b} (Y)) ≃ A (k, b) .

e_{k} (id \mathchar 58 ℓ_{p}^{b} (X) \to ℓ_{r}^{b} (Y)) ≃ A (k, b) .

D (c_{1} k / b, k) ≲ e_{k} (id \mathchar 58 ℓ_{p}^{b} (X) \to ℓ_{r}^{b} (Y)) ≲ D (c_{2} k / b, k) .

D (c_{1} k / b, k) ≲ e_{k} (id \mathchar 58 ℓ_{p}^{b} (X) \to ℓ_{r}^{b} (Y)) ≲ D (c_{2} k / b, k) .

D (m, k) = e_{m} (id \mathchar 58 X \to Y), A (k, b) = ∥ id \mathchar 58 X \to Y ∥.

D (m, k) = e_{m} (id \mathchar 58 X \to Y), A (k, b) = ∥ id \mathchar 58 X \to Y ∥.

\frac{1}{2} e_{k + 1} (id \mathchar 58 X \to Y) \leq e_{k b} (id \mathchar 58 ℓ_{\infty}^{b} (X) \to ℓ_{\infty}^{b} (Y)) \leq e_{k} (id \mathchar 58 X \to Y), k \in \mathds N .

\frac{1}{2} e_{k + 1} (id \mathchar 58 X \to Y) \leq e_{k b} (id \mathchar 58 ℓ_{\infty}^{b} (X) \to ℓ_{\infty}^{b} (Y)) \leq e_{k} (id \mathchar 58 X \to Y), k \in \mathds N .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Entropy numbers of finite dimensional mixed-norm balls and function space embeddings with small mixed smoothness

Sebastian Mayera and Tino Ullrich*b,*111Corresponding author: [email protected]

a Fraunhofer SCAI, Schloss Birlinghoven, Sankt Augustin, Germany

b TU Chemnitz, Fakultät für Mathematik, 09107 Chemnitz

Abstract

We study the embedding $\mathrm{id}\mathrel{\mathop{\mathchar 58\relax}}\ell_{p}^{b}(\ell_{q}^{d})\to\ell_{r}^{b}(\ell_{u}^{d})$ and prove matching bounds for the entropy numbers $e_{k}(\mathrm{id})$ provided that $0<p<r\leq\infty$ and $0<q\leq u\leq\infty$ . Based on this finding, we establish optimal dimension-free asymptotic rates for the entropy numbers of embeddings of Besov and Triebel-Lizorkin spaces of small dominating mixed smoothness, which gives a complete answer to Open Problem 6.4 in [8]. Both results rely on a novel covering construction recently found by Edmunds and Netrusov [10].

1 Introduction

Entropy numbers quantify the degree of compactness of a set, i.e., how well the set can be approximated by a finite set. Given a compact set $K$ in a quasi-Banach space $Y$ , the $k$ -th entropy number $e_{k}(K,Y)$ is defined to be the smallest radius $\varepsilon>0$ such that $K$ can be covered with $2^{k-1}$ copies of the ball $\varepsilon B_{Y}$ , i.e.,

[TABLE]

The concept of entropy numbers can be easily extended to operators. Given a compact operator $T\mathrel{\mathop{\mathchar 58\relax}}X\to Y$ , where $X$ and $Y$ are quasi-Banach spaces, the $k$ -th entropy number of the operator $T$ is defined to be

[TABLE]

If the spaces $X,Y$ are clear from the context, we will abbreviate $e_{k}(T\mathrel{\mathop{\mathchar 58\relax}}X\to Y)$ by $e_{k}(T)$ .

Entropy numbers (or the inverse concept of metric entropy) belong to the fundamental concepts of approximation theory. They appear in various approximation problems, e.g., in the estimation of the decay of operator eigenvalues [4, 11, 20], in the estimation of learning rates for machine learning problems [38, 42], or in bounding $s$ -numbers like approximation, Gelfand, or Kolmogorov numbers from below [4, 16]. We note that Gelfand numbers find application in the recent field of compressive sensing [6, 13, 16] and Information Based Complexity in general. Entropy numbers are also closely connected to small ball problems in probability theory [21, 24]. For further applications and basic properties, we refer to the monographs [5, 28], and the recent survey [8, Chapter 6].

The subject of this paper is to improve estimates for entropy numbers of embeddings between function spaces of dominating mixed smoothness

[TABLE]

where $\Omega\subset\mathds{R}^{n}$ is a bounded domain, $0<p_{0},p_{1},q_{0},q_{1}\leq\infty$ , and $r_{0}-r_{1}>(1/p_{0}-1/p_{1})_{+}$ . The case $A=B$ stands for the scale of Besov spaces of dominating mixed smoothness, while $A=F$ refers to the scale of Triebel-Lizorkin spaces, which includes classical $L_{p}$ and Sobolev spaces of mixed smoothness. That is why (1) also includes the classical embeddings

[TABLE]

if $r>1/p_{0}-1/p_{1}$ . Function space embeddings of this type play a crucial role in hyperbolic cross approximation [8]. Entropy numbers of such embeddings have been the subject of intense study, see [41], [8, Chapt. 6] and the recent papers by A.S. Romanyuk [34, 33, 35] and V.N. Temlyakov [39]. Note that there is a number of deep open problems connected to the case $p_{1}=\infty$ , which reach out to probability and discrepancy theory [8, 2.6, 6.4].

Typically, one observes asymptotic decays of the form

[TABLE]

where $\eta>0$ . This behavior is also well-known for $s$ -numbers of these embeddings like approximation, Gelfand, or Kolmogorov numbers, see [8] and the references therein. Although the main rate is the same as in the univariate case, the dimension still appears in the logarithmic term. We show that the logarithmic term completely disappears in regimes of small smoothness

[TABLE]

That is, we establish sharp purely polynomial asymptotic bounds of the form

[TABLE]

which depends on the underlying dimension $n$ only in the constant. This settles several open questions stated in the literature [8, 41], see Section 5, and makes the framework highly relevant for high-dimensional approximation.

A key ingredient in the proof of (3) is a counterpart of Schütt’s theorem for the entropy numbers of the embedding

[TABLE]

where $0<p<r\leq\infty$ and $0<q\leq u\leq\infty$ . We prove matching bounds for all parameter constellations. A particularly relevant case for the purpose of this paper is the situation where $b\leq d$ and

[TABLE]

Here, we have the surprising behavior

[TABLE]

Note that this relation is not a trivial extension of the classical Schütt result [37], which reads as

[TABLE]

for the norm- $1$ -embedding $\mathrm{id}\mathrel{\mathop{\mathchar 58\relax}}\ell^{b}_{p}\to\ell^{b}_{r}$ , where $0<p\leq r\leq\infty$ . In fact, using trivial embeddings would give an additional $\log$ -term in the third case of (4). The absence of this $\log$ -term makes (4) interesting and useful as we will see below.

For $1\leq k\leq\log(db)$ and $k\geq bd$ , it requires only trivial and standard volumetric arguments to establish matching bounds for the entropy numbers $e_{k}(id\mathrel{\mathop{\mathchar 58\relax}}\ell_{p}^{b}(\ell_{q}^{d})\to\ell_{r}^{b}(\ell_{u}^{d}))$ . The middle range $\log(bd)~{}\leq~{}k~{}\leq~{}bd$ is much more involved. In general, it is far from straightforward to generalize the proof ideas from $d=1$ (Schütt) to $d>1$ . Fortunately, the crucial work has already been done in a recent work by Edmunds and Netrusov [10]. They prove a general abstract version of Schütt’s theorem for operators between vector-valued sequence spaces. It remains for us to turn these general, abstract bounds into explicit estimates for the entropy numbers $e_{k}(\mathrm{id}\mathrel{\mathop{\mathchar 58\relax}}\ell_{p}^{b}(\ell_{q}^{d})\to\ell_{r}^{b}(\ell_{u}^{d}))$ . Unfortunately, the paper [10] is written very concisely, which makes it difficult to follow the arguments at several points. Hence, we decided to provide some additional, explanatory material. We hope that Section 3 helps a broader readership to appreciate the powerful ideas in [10], in particular, a novel covering construction based on dyadic grids.

Outline.

The paper is organized as follows. In Section 2, we recapitulate basics definitions and results including entropy numbers and Schütt’s theorem. Afterwards, in Section 3, we discuss the generalization of Schütt’s theorem by [10]. In Section 4, we show consequences of this result, including matching bounds for the entropy numbers $e_{k}(\mathrm{id}\mathrel{\mathop{\mathchar 58\relax}}\ell_{p}^{b}(\ell_{q}^{d})\to\ell_{r}^{b}(\ell_{u}^{d}))$ . Finally, we improve upper bounds for the entropy numbers of Besov and Triebel-Lizorkin embeddings in regimes of small smoothness in Section 5.

Notation.

As usual $\mathds{N}$ denotes the natural numbers, $\mathds{N}_{0}\mathrel{\mathop{\mathchar 58\relax}}=\mathds{N}\cup\{0\}$ , $\mathds{Z}$ denotes the integers, $\mathds{R}$ the real numbers, $\mathds{R}_{+}$ the positive real numbers, and ℂ the complex numbers. For $a\in\mathds{R}$ we denote $a_{+}\mathrel{\mathop{\mathchar 58\relax}}=\max\{a,0\}$ . We write $\log$ for the natural logarithm. $\mathds{R}^{m\times n}$ denotes the set of all $m\times n$ -matrices with real entries and $\mathds{R}^{n}$ denotes the Euclidean space. Vectors are usually denoted with $x,y\in\mathds{R}^{n}$ . For $0<p\leq\infty$ and $x\in\mathds{R}^{n}$ , we use the quasi-norm $\|x\|_{p}\mathrel{\mathop{\mathchar 58\relax}}=(\sum_{i=1}^{n}|x_{i}|^{p})^{1/p}$ with the usual modification in the case $p=\infty$ . If $X$ is a (quasi-)normed space, then $B_{X}$ denotes its unit ball and the (quasi-)norm of an element $x$ in $X$ is denoted by $\|x\|_{X}$ . If $X$ is a Banach space, then we denote its dual by $X^{\ast}$ . We will frequently use the quasi-norm constant, i.e., the smallest constant $\alpha_{X}$ satisfying

[TABLE]

For a given $0<p\leq 1$ we say that $\|\cdot\|_{X}$ is a $p$ -norm if

[TABLE]

As is well known, any quasi-normed space can be equipped with an equivalent $p$ -norm (for a certain $0<p\leq 1$ , see [2, 32]). If $T\mathrel{\mathop{\mathchar 58\relax}}X\to Y$ is a continuous operator we write $T\in\mathcal{L}(X,Y)$ and $\|T\|$ for its operator (quasi-)norm. The notation $X\hookrightarrow Y$ indicates that the identity operator $\mathrm{Id}\mathrel{\mathop{\mathchar 58\relax}}X\to Y$ is continuous. For two non-negative sequences $(a_{n})_{n=1}^{\infty},(b_{n})_{n=1}^{\infty}\subset\mathds{R}$ we write $a_{n}\lesssim b_{n}$ if there exists a constant $c>0$ such that $a_{n}\leq c\,b_{n}$ for all $n$ . We will write $a_{n}\simeq b_{n}$ if $a_{n}\lesssim b_{n}$ and $b_{n}\lesssim a_{n}$ . If $\alpha$ is a set of parameters, then we write $a_{n}\lesssim_{\alpha}b_{n}$ if there exists a constant $c_{\alpha}>0$ depending only on $\alpha$ such that $a_{n}\leq c_{\alpha}\,b_{n}$ for all $n$ .

Let $b,d\in\mathds{N}$ . For $0<p,q\leq\infty$ , the $bd$ -dimensional mixed space $\ell_{p}^{b}(\ell_{q}^{d})$ is defined as the space of all matrices $x\in\mathds{R}^{b\times d}$ equipped with the mixed (quasi-)norm

[TABLE]

with the usual modification that the corresponding sum is replaced by a maximum in the case that either $p=\infty$ or $q=\infty$ . We always refer to the $\ell_{p}$ -space supported on $[b]\mathrel{\mathop{\mathchar 58\relax}}=\{1,\ldots,b\}$ as the outer space and to the $\ell_{q}$ -space supported on $[d]$ as the inner space. For any $S\subset[b]\times[d]$ and $x\in\mathds{R}^{b\times d}$ we define $x_{S}$ as the matrix $(x_{S})_{ij}=x_{ij}$ for $(i,j)\in S$ , $(x_{S})_{ij}=0$ for $(i,j)\in S^{c}$ .

2 Entropy numbers and Schütt’s theorem

Let us recall basic notions and properties concerning entropy numbers. Let $K$ be a subset of a quasi-Banach space $Y$ . Given $\varepsilon>0$ , an $\varepsilon$ -covering is a set of points $x_{1},\dots,x_{n}\in K$ such that

[TABLE]

An $\varepsilon$ -packing is a set of points $x_{1},\dots,x_{m}\in K$ such that $\|x_{i}-x_{j}\|_{Y}>\varepsilon$ for pairwise different $i,j$ . The covering number $N_{\varepsilon}(K,Y)$ is the smallest $n$ such that there exists an $\varepsilon$ -covering of $K$ , while the packing number $M_{\varepsilon}(K,Y)$ is the largest $m$ such that there exists an $\varepsilon$ -packing of $K$ . It is easy to see that

[TABLE]

The metric entropy is defined to be

[TABLE]

see Remark 4 for the relation of metric entropy to other notions of entropy.

The $k$ -th entropy number $e_{k}(K,Y)$ can be redefined as

[TABLE]

It is easy to see that the sequence of entropy numbers is decaying, i.e., $e_{1}\geq e_{2}\geq\dots\geq 0$ . Moreover, the set $K$ is compact in $X$ if and only if $\lim_{k\to\infty}e_{k}(K,Y)=0$ .

Let $T$ denote an operator mapping between two quasi-Banach spaces $X$ and $Y$ . Recall from the introduction that the operator’s entropy numbers are given by

[TABLE]

Clearly, we have

[TABLE]

If $T_{1},T_{2}$ are both operators from $X$ to $Y$ , and $Y$ is a $\vartheta$ -normed space, then the entropy numbers of the sum can be estimated as follows

[TABLE]

Moreover, if $S\in\mathcal{L}(X,Y)$ and $R\in\mathcal{L}(Y,Z)$ then

[TABLE]

In particular, this gives

[TABLE]

For further general properties of entropy numbers and basic estimates, we refer the reader to the monographs [5, 25, 29]. For remarks on the history of entropy number research, see [5, 42].

In the concrete situation where $X=\ell_{p}^{b}$ and $Y=\ell_{q}^{b}$ for $0<p\leq q\leq\infty$ , the entropy numbers of the embedding $\mathrm{id}\mathrel{\mathop{\mathchar 58\relax}}\ell_{p}^{b}\to\ell_{q}^{d}$ are completely understood in terms of their decay in $k$ and $b$ . This central result is often referred to as Schütt’s theorem. For its history and references, see Remark 3. We only state the interesting case $0<p<q\leq\infty$ here.

Theorem 1 (Schütt’s theorem).

For $0<p\leq q\leq\infty$ and $k,b\in\mathds{N}$ , we have

[TABLE]

The constants in the estimates do neither depend on $k$ nor on $b$ .

Remark 2.

Note that $e_{k}(\mathrm{id}\mathrel{\mathop{\mathchar 58\relax}}\ell_{\infty}^{b}\to\ell_{\infty}^{b})=1$ as long as $k\leq b$ because $\|x-y\|_{\infty}=2$ for different $x,y\in\{-1,1\}^{b}$ .

Remark 3.

In 1984, Schütt [37] gave a proof for the general case of symmetric Banach spaces, which implies Theorem 1 if $1\leq p\leq q\leq\infty$ . In the range $1\leq k\leq b$ , we upper bound was first proved for all $0<p\leq q\leq\infty$ by Edmunds and Triebel [11] in 1996 by covering the unit ball using suitable sparse vectors. Edmunds and Netrusov [9, Thm. 2] generalized this covering construction in 1998 to arbitrary quasi-Banach spaces. In the same paper, Edmunds and Netrusov also proved matching lower bounds for general quasi-Banach spaces [9, Thm. 2]. Kühn [22] also proved the lower bound for $e_{k}(\mathrm{id}\mathrel{\mathop{\mathchar 58\relax}}\ell_{p}^{b}\to\ell_{q}^{b})$ with $0<p\leq q\leq\infty$ in 2001. Both [9, Thm. 2] and [22] rely on the very same idea to pack the unit ball with sparse vectors and use the fundamental combinatorial fact discussed in Remark 12 (ii) below. In 2000, Guédon and Litvak [15, Thm. 6] provided an alternative proof of Theorem 1 that relies completely on interpolation arguments and improved the constants in the upper bound.

Remark 4.

The concept of metric entropy for compact sets has been introduced independently by Kolmogorov [18] and Pontrjagin and Schnirelmann [31]. It should not be confused with the metric entropy of a dynamical system, which also has been introduced by Kolmogorov [19]. The latter entropy is also called Kolmogorov-Sinai entropy or measure-theoretic entropy. However, these two notions of metric entropy are related [1]. There is also a deep connection between Kolmogorov-Sinai entropy and the notions of information entropy and thermodynamic entropy [3].

3 Edmunds-Netrusov revisited

In addition to Schütt’s theorem, the main tool that we employ in this work is a powerful result by Edmunds and Netrusov [10]. They prove a generalization of Schütt’s theorem for vector-valued sequence spaces. Let us restate the part of their result that is relevant for us.

Theorem 5 (Theorems 3.1 and 3.2 in [10]).

Let $b\in\mathds{N}$ such that $b\geq 2$ , $0<p\leq r\leq\infty$ and let $X$ and $Y$ be $\gamma$ -normed quasi-Banach spaces. For $k,m\in\mathds{N}$ such that $m\leq k$ , let

[TABLE]

and

[TABLE]

For $k\geq\log_{2}(b)$ , we have the following.

(i)

If $k\leq b$ , then

[TABLE] 2. (ii)

If $k\geq b$ , then there are absolute constants $c_{1},c_{2}$ such that

[TABLE]

Theorem 5 states abstract lower and upper bounds that are “matching” in the sense that both have the same functional form. At first glance, this functional form is not obvious to expect and not easy to interpret. In addition, we found it difficult to follow the arguments in [10] at several points due to its succinct style of presentation. We thus believe that it is of value to review their key arguments and to provide some additional material that makes Theorem 5 more comprehensible. This is the subject of the remainder of this section. The reader who is only interested in applications of Theorem 5 may proceed directly to Section 4.

Remark 6.

Theorems 3.1 and 3.2 in [10] are only stated for $0<p<r\leq\infty$ . However, these theorems also hold true for $p=r$ . First note that in the latter case, we have

[TABLE]

Now for $k\geq b$ , Theorem 5 has been proved in [27, Thm. 4.3]. For $k\leq b$ , the lower bound in Theorem 5 is a consequence of [27, Thm. 4.3] in combination with arguments analogous to Remark 12; the upper bound is trivial.

3.1 A special case to begin with

If $p=r=\infty$ it is clear that one simply has to take $b$ -fold Cartesian products of the optimal covering and packing of $B_{X}$ in $Y$ to obtain the bounds

[TABLE]

In any other case, simple Cartesian products will not be good enough.

The special case of equal inner spaces $X=Y$ also allows for a rather straightforward solution if the dimension of the inner space is finite. For an easier understanding of the contribution in [10], see Theorem 5 above, we find it instructive to give a direct proof of this special case and point out its limitations. Indeed, a straightforward generalization of the well-known Edmunds-Triebel covering construction [11] based on volume arguments will do the job to establish the optimal upper bound. Recall that the essence of this covering construction is a result from best $s$ -term approximation, sometimes referred to as Stechkin’s inequality, see [8, Sect. 7.4], which yields a $s^{-1/p+1/r}$ -covering of $B_{\ell_{p}^{b}}$ in $\ell_{r}^{b}$ using only $s$ -sparse vectors. We simply have to extend this approach to row-sparse matrices. To improve readability, we will omit some technical details in the following proof.

Proposition 7.

Let $0<p\leq r\leq\infty$ and $X$ be $\mathds{R}^{d}$ (quasi-)normed with $\|\cdot\|_{X}$ . Let further $b,d\in\mathds{N}$ and $d>5$ . Then, for $1\leq k\leq bd$ ,

[TABLE]

Proof.

The first case is trivial. The last case follows from volumetric arguments using the recent findings in Section [17, Sect. 3.2]. By these we know that

[TABLE]

and for $\operatorname{vol}(B_{\ell_{r}^{b}(X)})^{1/(bd)}$ accordingly. For $k>bd$ we use the standard volume argument to obtain

[TABLE]

For the second case let $s\in[b]$ . Clearly, we have that

[TABLE]

where $B_{I}\mathrel{\mathop{\mathchar 58\relax}}=\{x\in B_{\ell_{p}^{b}(\ell_{q}^{d})}\mathrel{\mathop{\mathchar 58\relax}}\|x_{i\cdot}\|_{X}\geq\|x_{k\cdot}\|_{X}\text{ for }i\in I,k\in[b]\setminus I\}$ . When we replace the $s$ rows with the largest $\|\cdot\|_{X}$ -(quasi-)norm by [math] in $x\in B_{I}$ , then the resulting matrix has a $\ell_{r}^{b}(X)$ -(quasi-)norm of at most $s^{-(1/p-1/r)}$ , which follows from a well-known relation for best $s$ -term approximation in $\ell_{r}$ . Hence, if we wish to cover the set $B_{I}$ by balls of radius $\varepsilon\simeq s^{-(1/p-1/r)}$ , it suffices to take care of the $s$ largest components of the vectors in $B_{I}$ . That is, we take a suitable covering of $B_{\ell_{p}^{s}(X)}$ in $\ell_{r}^{s}(X)$ and append $b-s$ zero rows to every matrix of the covering. A similar volumetric argument as above in (8), (9) tells us that

[TABLE]

so that we obtain a covering of $B_{I}$ with cardinality $2^{c_{p,q}sd}$ .

Combining the coverings for all possible index sets $I$ yields an $\varepsilon$ -covering $U$ of $B_{\ell_{p}^{b}(X)}$ in $B_{\ell_{r}^{b}(X)}$ , where $\varepsilon\simeq s^{-1/p+1/r}$ , with cardinality

[TABLE]

Now, given $k\in[bd]$ , we choose

[TABLE]

such that

[TABLE]

is assured. Consequently, we obtain the upper bound

[TABLE]

∎

Remark 8.

One way to obtain the matching lower bound in the case $X=Y$ is to generalize the proof idea underlying Schütt’s theorem (Theorem 1) in the case that $\log(b)\leq k\leq b$ . However, the standard combinatorial lemma is not sufficient here. A suitable packing to do this generalization has already been considered in [6, Prop. 5.3]. See also Remark 12 below.

3.2 The covering construction by Edmunds and Netrusov

The generalized Edmunds-Triebel covering is optimal for finite dimensional $X=Y$ , see Proposition 7 in the previous section. In the general situation, where $X$ is compactly embedded into $Y$ , it seems that the volumetric arguments underlying (11) are too coarse to obtain sharp estimates (at least in the finite dimensional situation). The main contribution of [10] is a covering construction which resolves this shortcoming by not using volumetric arguments at all. In particular, $X$ and $Y$ do not have to be finite dimensional. We give a detailed recapitulation of their idea in this section. For some comments concerning the lower bound in Theorem 5, see Remark 12 at the end of this section.

The covering in [10] works in the very general situation where we are given quasi-Banach spaces $X_{1},\dots,X_{b}$ and $Y_{1},\dots,Y_{b}$ , see Proposition 10 below. The basic idea is to cover the unit ball $B_{\ell_{p}(\{X_{i}\}_{i=1}^{b})}$ by $N$ cuboids

[TABLE]

where $v^{1},\dots,v^{N}\in\mathds{R}_{+}^{b}$ and $N$ is exponential in $b$ (think of each cuboid as an anisotropically rescaled version of $B_{\ell_{\infty}(\{X_{i}\}_{i=1}^{b})}$ ). The crux is to find suitable vectors $v^{i}$ such that an optimal covering can be reached by covering the cuboid $U(v^{i})$ using a product of optimal coverings of $B_{X_{1}}$ ,…, $B_{X_{b}}$ . Edmunds and Netrusov [10] had the idea to consider vectors that form a dyadic grid derived from the simplex

[TABLE]

The dyadic grid is constructed with the help of the following mapping. Let

[TABLE]

and for $x\in[0,1]^{b}$ , put

[TABLE]

This mapping $\upsilon$ leads to a finite grid with the following properties.

Lemma 9 (Simplification of Lemma 2.2 in [10]).

For $b\in\mathds{N}$ , let $\Gamma(b)=\upsilon(S(b))$ . The set $\Gamma(b)$ has the following properties.

(i)

For all $u\in S(b)$ , there is $v\in\Gamma(b)$ such that $u_{i}\leq v_{i}$ for all $i\in[b]$ . 2. (ii)

For all $v\in\Gamma(b)$ , we have $\|v\|_{1}\leq 2$ . 3. (iii)

For all $v\in\Gamma(b)$ , we have $bv_{i}\in\mathds{N}$ for each $i\in[b]$ . 4. (iv)

We have $\sharp\Gamma(b)\leq 2^{3b}$ .

Proof.

Given $x\in S(b)$ , let $v=\upsilon(x)$ . We clearly have $\sum_{i=1}^{b}v_{i}\leq 2$ and $bv_{i}\in\mathds{N}$ for all indices $i=1,\dots,b$ . Further

[TABLE]

which is a crucial property to estimate the cardinality of the set $\Gamma(b)$ . Let

[TABLE]

Clearly, $\sharp B(v,k)\leq\sharp C(v,k)\leq\min\{b,b2^{1-k}\}$ . Varying over all elements in the simplex, $B(v,0)$ can be any of the $2^{b}$ subsets of $[b]$ . Fixing $B(v,0)$ , there are at most $2^{b}$ possibilities for $B(v,1)$ . Fixing $B(v,0)$ up to $B(v,k-1)$ , there are at most $2^{b2^{1-k}}$ possibilities for $B(v,k)$ . Hence, in total the set $\Gamma(b)$ may contain at most

[TABLE]

many elements. ∎

The dyadic grid according to Lemma 9 allows to establish the following upper bound on entropy numbers.

Proposition 10 (Reformulation of Lemma 2.3 in [10]).

Let $X_{1},\dots,X_{b}$ and $Y_{1},\dots,Y_{b}$ be quasi-Banach spaces, let $0<p\leq r\leq\infty$ , and let $k\in\mathds{N}$ such that $k\geq 8b$ . Then, we have

[TABLE]

Proof.

Consider the transformed grid

[TABLE]

By Lemma 9 (i), we have

[TABLE]

where $U(v)$ is the cuboid defined in (12).

Let $v\in\Gamma(b,p)$ be given by $v=(v_{1}^{1/p},\dots,v_{b}^{1/p})$ . For each

[TABLE]

let $\mathcal{C}_{i}$ be a $e_{m_{i}}(v_{i}^{1/p}B_{X_{i}},Y_{i})$ -covering. Then, for every $x\in U(v)$ , there is $y\in\ell_{r}^{b}(Y)$ such that $y_{i\cdot}\in\mathcal{C}_{i}$ and

[TABLE]

By construction of the set $\Gamma(b)$ , we have $\left(\sum_{i=1}^{b}v_{i}\right)^{1/r}\leq 2^{1/r}$ and

[TABLE]

Finally, note that the product $\mathcal{C}_{1}\times\cdots\times\mathcal{C}_{s}$ has cardinality

[TABLE]

which, in combination with $\sharp\Gamma(b,p)\leq 2^{3b}$ , implies the desired result. ∎

Proposition 10 is not the complete final answer. For $k\leq b$ , we have to modify the proof of Proposition 7. We sketch the proof and refer to the proof of [10, Thm 3.1] for technical details.

Proposition 11.

Let $\log_{2}(b)\leq k\leq b$ . Then, we have

[TABLE]

where $A(k,b)$ is defined in Theorem 5.

Proof sketch.

Let $s\in[k]$ . It is clear that, analogously to (10), we have

[TABLE]

Similar as in Proposition 7, we can use a covering for $B_{\ell_{p}^{s}(X)}$ to construct a covering for $B_{I}$ . Consider now $\varepsilon=e_{k}(B_{\ell_{p}^{s}(X)},\ell_{r}^{s}(Y))$ and let $\Gamma_{0}$ be a minimal $\varepsilon$ -covering of $B_{\ell_{p}^{s}(X)}$ in $\ell_{r}^{s}(Y)$ . Let $\Gamma_{I}=\Gamma_{0}\times\{0\}^{[b]\setminus I}$ . Then, for every $x\in B_{I}$ , there is $y\in\Gamma_{I}$ such that

[TABLE]

where the second term on the right-hand side follows from the best $s$ -term approximation result already used in Proposition 7. Consequently, we have

[TABLE]

In contrast to Proposition 7, volumetric arguments would now give a suboptimal estimate for the entropy numbers $e_{k}(B_{\ell_{p}^{s}(X)},\ell_{r}^{s}(Y)$ . In this general situation, it requires Proposition 10 with $X_{1}=\dots=X_{b}=X$ and $Y_{1}=\dots=Y_{b}=Y$ to get the proper estimate. Concretely, since $s\leq k$ , we have

[TABLE]

which leads in combination with Proposition 10 and (14) to an upper bound of the form

[TABLE]

The usual arguments show that it is optimal to choose $s$ of the order $k/\log(eb/k)$ . ∎

Remark 12.

We close this section with some remarks concerning the lower bound in Theorem 5. Its proof relies on two surprisingly simple observations, see [10] for details.

(i) Let $M$ be a maximal $\varepsilon$ -packing of $B_{X}$ in $Y$ . Using the Gilbert-Varshamov bound, which is well-known in coding theory [14, 40], we know that $(2s)^{-1/p}M^{2s}\subset B_{\ell_{p}^{b}(X)}$ contains $N$ elements of mutual distance $s^{1/r-1/p}\varepsilon$ , where $N\simeq\mathrm{card}(M)^{s}$ . This leads to the lower bound

[TABLE]

see [27, p. 68] and [10, Lem. 2.6] for a more general formulation. Given $k\in\mathds{N}$ , we have to make a good choice for the dimension $s$ to maximize the lower bound. Choose $s=k/m$ for some $m\in[k]$ to obtain

[TABLE]

If $k\leq b$ , we conclude

[TABLE]

If $k\geq b$ , then $m\geq k/b$ guarantees $s=k/m\leq b$ and thus

[TABLE]

(ii) Choose a vector $x\in B_{X}$ such that

[TABLE]

We construct a packing by building row-sparse matrices, where the nonzero rows contain copies of $x$ and the row support sets are chosen according to the following combinatorial fact that is well-known in various disciplines of mathematics, see. e.g., [13, Lemma 10.12], [22], [12] or [30, Prop. 2.21, p. 219]. Given $s,n\in\mathds{N}$ such that $0<s<n/2$ , there exist subsets $I_{1},\ldots,I_{N}$ of $[n]$ , where

[TABLE]

such that each subset $I_{i}$ has cardinality $2s$ and

[TABLE]

This leads to the lower bound

[TABLE]

In view of the packing construction that we have mentioned in Remark 8 it is somewhat surprising that it is not necessary to combine the combinatorics of the two observations in order to obtain the optimal abstract bound in Theorem 5. An explanation is given in [27, Rem. 4.13, p. 69].

4 Consequences of the Edmunds-Netrusov result

We discuss some consequences of Theorem 5. Let us begin with considering the entropy numbers

[TABLE]

We have the following matching bounds.

Theorem 13.

Let $0<p\leq r\leq\infty$ and $0<q\leq u\leq\infty$ . Then, we have

[TABLE]

For $\log(bd)\leq k\leq bd$ , we have the following case distinctions.

(i)

Let $1/p-1/r>1/q-1/u\geq 0$ .

(i.a)

In the special case $q=u$ , we have

[TABLE] 2. (i.b)

If $q<u$ and $b\leq d$ , then

[TABLE] 3. (i.c)

If $q<u$ and $d\leq b$ , then

[TABLE] 2. (ii)

Let $1/q-1/u\geq 1/p-1/r\geq 0$ . Then, we have

[TABLE]

Proof.

For $1\leq k\leq\log(bd)$ and $k\geq bd$ , it requires only standard volumetric arguments, see [27, Appendix A] for details. Let us also refer to [7, Lemma 3], where this case has been already considered. Let $D(m,k)$ and $A(k,b)$ be as defined in Theorem 5. Moreover, throughout the proof, we write for $k,l\in\mathds{N}$ ,

[TABLE]

Ad (i.a). Since $q=u$ , it follows from Theorem 1 that $e_{l}(\mathrm{id}\mathrel{\mathop{\mathchar 58\relax}}\ell_{q}^{d}\to\ell_{u}^{d})\simeq 1$ for $1\leq l\leq d$ and consequently that $D(1,k)=D(k/b,k)\simeq 1$ and $A(k,b)\simeq 1$ for all $k\leq d$ . Now, for $k\geq d$ , we have that $s_{k,l}\simeq(l/k)^{1/p-1/q}$ for $1\leq l\leq d$ , so the sequence is bounded from above by a monotonically increasing sequence. For $d\leq l\leq k$ , we have

[TABLE]

Since $2^{-l/d}$ decays faster in $l$ than $(l/k)^{1/p-1/r}$ increases, we conclude that for $d\leq l\leq k$ , the sequence $s_{k,l}$ is “essentially monotonically decreasing”. To be more precise $t_{k,l}$ attains at $l=\beta_{p,r}d$ its maximum, where the factor $\beta$ depends only on $p$ and $r$ . Hence, the maximum of $s_{k,l}$ can be bounded from above by a constant times the maximum of $t_{k,l}$ and therefore by $c_{p,r}(d/k)^{1/p-1/r}$ . Using analogous arguments for $D(k/b,k)$ , we conclude that $\widetilde{D}(1,k)=D(k/b,k)\simeq(d/k)^{1/p-1/r}$ and

[TABLE]

for $d\leq k\leq b$ .

Ad (i.b). Consider now $0<q<u$ and $b\leq d$ . For $\log(bd)\leq k\leq b$ , we have in consequence of Theorem 1, that $s_{k,l}\simeq(l/k)^{1/p-1/r}$ for $1\leq l\leq\log(d)$ and

[TABLE]

Since $1/p-1/r>1/q-1/u$ , the sequence $s_{k,l}$ is bounded from above and below up to a constant by a monotonically increasing sequence and consequently, the maximum is attained at $l=k$ such that $D(1,k)\simeq(\log(ed/k)/k)^{1/q-1/u}$ . Since $b\leq d$ , we further have

[TABLE]

For $b\leq k\leq d$ we find as before that $D(k/b,k)\simeq(\log(ed/k)/k)^{1/q-1/u}$ and for $d<k\leq bd$ , we have the estimate

[TABLE]

Ad (i.c). Consider now $d\leq b$ . For $\log(bd)\leq k\leq d$ , we find $D(1,k)\simeq(\log(ed/k)/k)^{1/q-1/u}$ since the sequence $s_{k,l}$ is bounded from below and above by a sequence that increases monotonically in $l$ . If $d\leq k\leq b$ , then

[TABLE]

and

[TABLE]

Finally, if $b\leq k\leq bd$ , then $D(k/b,k)\simeq(d/k)^{1/p-1/r}d^{1/u-1/q}$ .

Ad (ii). For $\log(bd)\leq k\leq b$ , we observe that

[TABLE]

since the term $e_{\ell}(B_{\ell_{q}^{d}},\ell_{u}^{d})$ is decaying in $\ell$ at least as fast as $(\ell/k)^{1/p-1/r}$ is growing. Hence,

[TABLE]

Next, we consider $b\leq k\leq b\log(d)$ . Since $k/b\leq\log(d)$ , we find

[TABLE]

where we have used $b/k\leq 1$ in the last estimate. At the same time, since $k/b\leq\log(d)$ , we also have $\log(bd/k)\gtrsim\log(d)$ and thus

[TABLE]

Finally, for $b\log(d)\leq k\leq bd$ it is easy to see that

[TABLE]

∎

Remark 14.

The upper bound for $k\geq bd$ in Theorem 13 also follows from [7, Lem. 3]. The upper bound in Theorem 13 (ii) has also been proved in [41, Lem 3.16] for the range $b\max\{\log(d),\log(b)\}\leq k\leq bd$ . The proof there uses the following covering construction, which appeared first in [23, Proof of Prop. 4] to our knowledge. Let $X_{1},\dots,X_{b}$ and $Y_{1},\dots,Y_{b}$ be (quasi-)Banach spaces and $0<p,r\leq\infty$ . The covering rests on the idea to split the ball $B_{\ell_{p}^{b}({X_{1},\dots,X_{b}})}$ into subsets of matrices with non-increasing rows,

[TABLE]

where the union is taken over all permutations of $[b]$ . This leads to the upper bound

[TABLE]

for $n_{1},\dots,n_{b}\in\mathds{N}$ . If

[TABLE]

with $0<q\leq u$ , and we chooses $n_{j}\simeq j^{-\alpha}$ for some $0<\alpha<1$ such that

[TABLE]

then (16) is strong enough to obtain the upper bound in Theorem 13 (ii), provided

[TABLE]

∎

Now we increase the level of abstraction and consider mixed norms of higher order. Let, for $\mu=1,...,b$ , the weighted spaces $X_{\mu}$ and $Y_{\mu}$ be given by

[TABLE]

with $0<p\leq r\leq\infty$ , $0<q\leq u\leq\infty$ and $\alpha\,\beta\in\mathds{R}$ . The dimensions $(d_{\mu})_{\mu}$ and $(b_{\mu})_{\mu}$ are non-decreasing natural numbers satisfying $d_{\mu}\gtrsim b_{\mu}$ . These spaces are used as “inner spaces” in the way that

[TABLE]

Note that for $x=(x_{\mu,i,j})_{\mu,i,j}\in X$ with $\mu=1,\dots,b$ , $i=1,\dots,b_{\mu}$ , $j=1,\dots,d_{\mu}$ , the norm is given by

[TABLE]

We are interested in the behavior of the entropy numbers

[TABLE]

in the special situation $1/q-1/u<1/p-1/r$ .

Proposition 15.

Let $0\leq 1/q-1/u<1/p-1/r$ and $\alpha-\beta\leq 1/p-1/r-(1/q-1/u)$ . Let further $X,Y$ and $X_{\mu}$ , $Y_{\mu}$ as above. Then we have for all $k\geq 8b$ and $k\geq\max\limits_{\mu=1,...,b}d_{\mu}$

[TABLE]

Proof.

We use Theorem 5, in particular the upper bound in Proposition 10. Since $k\geq 8b$ we obtain

[TABLE]

Let us evaluate the first $\max[\cdots]$ . With Theorem 13, (i.b), (i.c) we have

[TABLE]

Because of $1/p-1/r>1/q-1/u$ the maximum is attained for $\ell=d_{\mu}$ , which leads to

[TABLE]

Let us discuss the second $\max[\cdots]$ . Using again Proposition 10 we obtain

[TABLE]

Due to our assumption the exponent for $d_{\mu}$ is positive in both cases. Since $k\geq d_{\mu}$ we may replace $d_{\mu}$ by $k$ to increase the right-hand side. This leads to

[TABLE]

∎

We are now aiming for a similar relation for small $k$ .

Proposition 16.

Let $\alpha-\beta>0$ and $1/p-1/r>1/q-1/u\geq 0$ . Then we have for $8b\leq k\leq\min\limits_{\mu=1,...,b}d_{\mu}$ the estimate

[TABLE]

Proof.

Again we use Theorem 5, in particular the upper bound in Proposition 10. This gives

[TABLE]

where we used once again Theorem 13, (i.b). Clearly, we get

[TABLE]

Since the function $x\mapsto x^{-(\alpha-\beta)}[\log(ex)]^{(1/q-1/u)}$ is bounded on $[1,\infty)$ we conclude with

[TABLE]

∎

5 Polynomial decay of entropy numbers for

multivariate function space embeddings

We come to the main subject of this paper, improved upper bounds for entropy numbers of function space embeddings (1) in regimes of small mixed smoothness.

5.1 Function spaces of dominating mixed smoothness

Besov and Triebel-Lizorkin spaces of mixed smoothness are typically defined via a dyadic decomposition on the Fourier side. Let $\{\varphi_{j}\}_{j\in\mathds{N}_{0}^{n}}$ be the standard tensorized dyadic decomposition of unity, see [36] and [41]. We further denote by $S^{\prime}(\mathds{R}^{n})$ the space of tempered distributions and by $D^{\prime}(\Omega)$ the space of distributions (dual space of $D(\Omega)$ , which represents the space of test functions on the bounded domain $\Omega\subset\mathds{R}^{n}$ ). The Besov space of dominating mixed smoothness $S^{r}_{p,q}B(\mathds{R}^{n})$ with smoothness parameter $r>0$ and integrability parameters $0<p,q\leq\infty$ is given by

[TABLE]

with the usual modification in the case $q=\infty$ . The Triebel-Lizorkin space of dominating mixed smoothness $S^{r}_{p,q}F(\hbox{\msbm{R}}^{n})$ is given by $(p<\infty)$

[TABLE]

The latter scale of spaces contains the classical $L_{p}$ spaces and Sobolev spaces with dominating mixed smoothness if $1<p<\infty$ and $q=2$ , namely we have $S^{0}_{p,2}F(\mathds{R}^{n})=L_{p}(\mathds{R}^{n})$ and $S^{k}_{p,2}F(\mathds{R}^{n})=S^{k}_{p}W(\mathds{R}^{n})$ for $k\in\mathds{N}$ . Note that we also have $S^{r}_{p,p}B(\mathds{R}^{n})=S^{r}_{p,p}F(\mathds{R}^{n})$ for all $0<p<\infty$ and $r\in\mathds{R}$ . Though we have the embedding

[TABLE]

for $p_{0}\leq p_{1}$ and $r_{0}-r_{1}>1/p_{0}-1/p_{1}$ , see [36, Chapt. 2], the embedding (20) is never compact. Hence, the entropy numbers of embeddings between function spaces defined on the whole $\mathds{R}^{n}$ do not converge to zero. We restrict our considerations to spaces on bounded domains $\Omega$ . Let $\Omega$ be an arbitrary bounded domain in $\mathds{R}^{n}$ . Then, we define $S^{r}_{p,q}A(\Omega)$ for $A\in\{B,F\}$ as

[TABLE]

and its (quasi-)norm is given by $\|f\|_{S^{r}_{p,q}A(\Omega)}\mathrel{\mathop{\mathchar 58\relax}}=\inf_{g|_{\Omega}=f}\|g\|_{S^{r}_{p,q}A}$ . The embedding (20) transfers to the bounded domain $\Omega$ and is compact such that the entropy numbers decay and converge to zero.

5.2 Sequence spaces

The key to establishing the decay rate of entropy numbers for the embedding (1) is a discretization technique which has been developed over the years by several authors beginning with Maiorov [26]. Later, when wavelet isomorphisms have been established, this technique has been refined by Lemarie, Meyer, Triebel and many others. In [41, Thm. 2.10] Vybíral gave the necessary modifications to deal with the above defined $S^{r}_{p,q}A(\Omega)$ spaces in detail. The main advantage of this approach is to transfer questions for function space embeddings to certain sequence spaces.

Using sufficiently smooth wavelets with sufficiently many vanishing moments (and the notation from [41]) the mapping

[TABLE]

represents a sequence spaces isomorphism between $S^{r}_{p,q}B(\mathds{R}^{n}),S^{r}_{p,q}F(\mathds{R}^{n})$ and

[TABLE]

respectively, with the usual modification in the case $\max\{p,q\}=\infty$ . Here we denote for $j\in\mathds{N}_{0}^{n}$ and $m\in\mathds{Z}^{n}$

[TABLE]

and

[TABLE]

Further, $\chi_{j,m}$ denotes the characteristic function of $Q_{j,m}$ . Consider the sequence spaces

[TABLE]

with (quasi-)norms given by

[TABLE]

Let us also define the following building blocks for $\mu\in\mathds{N}_{0}$ fixed

[TABLE]

Clearly, for $\mu\in\mathds{N}_{0}$ we have

[TABLE]

and $\sharp A_{j}^{\Omega}\simeq 2^{\|j\|_{1}}=2^{\mu}$ . Consider

[TABLE]

for $r_{0}-r_{1}>1/p_{0}-1/p_{1}$ such that the embedding is compact. Defining the building blocks

[TABLE]

we have $\mathrm{id}=\sum_{\mu=0}^{\infty}\mathrm{id}_{\mu}$ . Of course, the identity

[TABLE]

holds true, where $\mathrm{id}_{\mu}^{\prime}$ denotes the corresponding embedding operator on the respective block (24). Although these operators have the same mapping properties we use different notations to formally distinguish between them. If $a=a^{{\dagger}}=b$ we also have, for a finite index set $I$ , that

[TABLE]

where $X_{\mu}=2^{\mu(r_{0}-1/p_{0})}\ell_{q_{0}}^{(\mu+1)^{n-1}}(\ell_{p_{0}}^{2^{\mu}})$ and $Y_{\mu}=2^{\mu(r_{1}-1/p_{1})}\ell_{q_{1}}^{(\mu+1)^{n-1}}(\ell_{p_{1}}^{2^{\mu}})$ , which means $d_{\mu}=2^{\mu}$ and $b_{\mu}=(\mu+1)^{n-1}$ in the notation of (17). In particular, we have $b_{\mu}\lesssim d_{\mu}$ .

5.3 Entropy numbers

As a consequence of the boundedness of certain restriction and extension operators, see [41, 4.5], the investigation of entropy numbers of Besov space embeddings can be shifted to the sequences spaces side. We formulate our first result in the framework of sequence spaces, which improves the upper bound. More specifically, we prove that the lower bound in (36) is sharp in the case that $0\leq 1/p_{0}-1/p_{1}<r_{0}-r_{1}\leq 1/q_{0}-1/q_{1}$ , which also includes the limiting case $r_{0}-r_{1}=1/q_{0}-1/q_{1}$ . What is known in this direction is summarized in Remark 20 below.

Proposition 17.

Let $\Omega$ be a bounded domain and $0<q_{0}<q_{1}\leq\infty$ , $0<p_{0}\leq p_{1}\leq\infty$ such that

[TABLE]

Then we have

[TABLE]

Proof.

The lower bound follows by [41, Thm. 3.18]. The upper bound is the actual contribution. We argue as follows.

Step 1. Put $\varrho\mathrel{\mathop{\mathchar 58\relax}}=\min\{1,p_{1},q_{1}\}$ and fix $m\geq m_{0}$ , where $m_{0}$ is large enough (depending on $p_{0},p_{1},q_{0},q_{1},r_{0},r_{1}$ ). We decompose the identity operator $\mathrm{id}$ as follows

[TABLE]

where $L_{m}\mathrel{\mathop{\mathchar 58\relax}}=\lfloor\log_{2}(m)\rfloor$ and $M_{m}\mathrel{\mathop{\mathchar 58\relax}}=\lfloor m/8\rfloor$ . With an eye on Proposition 10 this means in particular that $m\geq 8L_{m}$ and $m\geq 8M_{m}$ (for $m$ large enough). Using (5) we obtain

[TABLE]

Step 2. We estimate the first summand. By (25) this breaks down to the entropy numbers

[TABLE]

with $X_{\mu},Y_{\mu}$ chosen as after (25) and $I$ denotes the range for $\mu$ . Putting

[TABLE]

and $\beta\mathrel{\mathop{\mathchar 58\relax}}=r_{1}-1/p_{1}$ in (17) we may apply Proposition 15 . Since $m\geq\max\{8L_{m},\max\limits_{\mu}d_{\mu}\}$ we may apply Proposition 15 to obtain

[TABLE]

Note that, due to Proposition 15, we only used that $r_{0}-r_{1}\leq 1/q_{0}-1/q_{1}$ . To estimate the first summand in (27) it is not needed that $r_{0}-r_{1}>1/p_{0}-1/p_{1}$ .

Step 3. Let us care for the second summand in (27). Clearly, it can be reduced to (28) with spaces $X_{\mu},Y_{\mu}$ defined analogously, but with $\mu$ running this time in the range

[TABLE]

Hence, we have $b\mathrel{\mathop{\mathchar 58\relax}}=\sharp I=M_{m}\leq\min\limits_{\mu}d_{\mu}$ . We apply Proposition 16 to end up with (29). Note, that we have used here only that $\alpha-\beta>0$ , or, equivalently, $r_{0}-r_{1}>1/p_{0}-1/p_{1}$ .

Step 4. Finally, we deal with the third summand in (27). Clearly, we have

[TABLE]

This gives

[TABLE]

This concludes the proof. ∎

In the next theorem we consider the situation where a Besov type sequence space compactly embeds into a Triebel-Lizorkin type sequence space. This setting is particularly important, since it leads to results with target space $L_{p}$ .

Proposition 18.

Let $\Omega$ be a bounded domain and $0<q_{0}<q_{1}\leq\infty$ , $0<p_{0}\leq p_{1}<\infty$ , $q_{0}<p_{0}$ , $r_{0}>r_{1}$ such that

[TABLE]

Then we have

[TABLE]

Proof.

Again, the lower bounds follow from [41, Thm. 3.18].

Step 1. In case $p_{1}>q_{1}$ we use the commutative diagram in Figure 1.

Then we have by (6) and (7)

[TABLE]

Hence, we may use Proposition 17 and obtain

[TABLE]

Step 2. Now we consider $p_{1}<q_{1}$ . After decomposing the identity operator in an analogous way as in (26) and (27) we use the commutative diagrams in Figure 2 for the first and second summand, respectively. In fact, for the first summand in (27) we obtain by (7)

[TABLE]

Note, that the identity operator is bounded since $\Omega$ is a bounded domain.

Furthermore, the entropy numbers

[TABLE]

can be estimated by the same reasoning as in Step 2 of the proof of Proposition 17. Note that $r_{0}-r_{1}$ may be smaller than $1/p_{0}-1/q_{1}$ . However, this is not important for the argument (based on Proposition 15). It remains to consider the second summand in (27). Here we use the right diagram in Figure 2 and obtain

[TABLE]

We continue to estimate the appearing entropy numbers as in Step 3 of the proof of Proposition 17. Note that $r_{0}-r_{1}$ might be larger than $1/q_{0}-1/p_{1}$ . However, for the argument, we only need $r_{0}-r_{1}>1/p_{0}-1/p_{1}$ . This concludes the proof. ∎

Let us finally consider the situation, where a Triebel-Lizorkin type sequence space compactly embeds into a Besov type sequence space.

Proposition 19.

Let $0<q_{0}<q_{1}\leq\infty$ , $0<p_{0}\leq p_{1}<\infty$ , $q_{1}>p_{1}$ , $r_{0}>r_{1}$ such that

[TABLE]

and let $\Omega$ be a bounded domain. Then we have

[TABLE]

Proof.

The lower bound follows from [41, Thm. 3.18].

Step 1. For the upper bound in the case $p_{0}<q_{0}$ we may use the commutative diagram in Figure 3 below to decompose the identity operator. Afterwards, we use (6) to reduce everything to the situation in Proposition 17.

Step 2. In case $p_{0}>q_{0}$ we argue analogously to Step 2 of Proposition 18. This time we use the decompositions in Figure 4 for the first and second summand in (27), respectively. ∎

Unfortunately, we were not able to find a corresponding result for the $f-f$ situation. So, this remains an open problem.

Remark 20.

To clarify the contribution of this paper, let us briefly recapitulate the known results and open questions which motivated this work. For several results and historical remarks on the subject we refer to [8] and the references therein. In particular, Vybíral [41, Thm. 4.9] proved for $0<p_{0}\leq p_{1}\leq\infty$ and $0<q_{0}\leq q_{1}\leq\infty$ in the case of small smoothness

[TABLE]

that there is for any $\varepsilon>0$ a number $C_{\varepsilon}>0$ such that

[TABLE]

The result is a direct consequence of the bound for $r>\max\{1/p_{0}-1/p_{1},1/q_{0}-1/q_{1}\}$ (the case of “large smoothness”), saying that

[TABLE]

In fact, the entropy numbers in (36) can be bounded from above by

[TABLE]

if $q^{\ast}\geq q_{0}$ . Now choose $q_{1}>q^{\ast}>q_{0}$ such that $1/q^{\ast}-1/q_{1}+\varepsilon/(n-1)=r_{0}-r_{1}>1/q^{\ast}-1/q_{1}$ , which, together with (37) and $q_{0}$ replaced by $q^{\ast}$ , implies (36). ∎

The propositions proved above allow to improve a number of existing results for the entropy numbers of the embedding

[TABLE]

Theorem 21.

Let $\Omega$ be a bounded domain and $A,A^{{\dagger}}\in\{B,F\}$ but $(A,A^{{\dagger}})\neq(F,F)$ . Let $0<q_{0}<q_{1}\leq\infty$ , $0<p_{0}\leq p_{1}<\infty$ and $r_{0}>r_{1}$ such that

[TABLE]

In addition, we assume $q_{0}<p_{0}$ if $(A,A^{{\dagger}})=(B,F)$ and $q_{1}>p_{1}$ if $(A,A^{{\dagger}})=(F,B)$ , respectively. Then it holds

[TABLE]

Proof.

The result is a direct consequence of Propositions 17, 18, 19 and the machinery described in the proof of [41, Thm. 4.11]. ∎

As a corollary of Theorem 21, we obtain the following result, which settles Open Problem 6.4 in [8].

Corollary 22.

Let $\Omega$ be as above. Let further $0<q<p_{0}\leq p_{1}$ , $1<p_{1}<\infty$ and $1/p_{0}-1/p_{1}<r\leq 1/q-1/2$ . Then we have

[TABLE]

Proof.

Identifying $S^{0}_{p_{1},2}F(\Omega)=L_{p_{1}}(\Omega)$ in the case $1<p_{1}<\infty$ , the result is a direct consequence of Theorem 21. ∎

With the final corollary below (from Theorem 21) we close some more gaps in [41, Thm. 4.18 (ii), (iii)].

Corollary 23.

Let $\Omega$ be as above. We have the following sharp bounds for entropy numbers.

(i)* Let $1<p\leq\infty$ and $1/p<r\leq 1$ . Then, we have*

[TABLE]

(ii)* Let $1<p<q<\infty$ and $1/p-1/q<r\leq 1/2$ . Then, we have*

[TABLE]

(iii)* Let $0<q<p\leq\infty$ , $q<1$ and $1/p<r\leq 1/q-1$ . Then, we have*

[TABLE]

Remark 24.

Entropy numbers of mixed smoothness Sobolev-Besov embeddings into $L_{p}$ , where $1\leq p\leq\infty$ , recently gained significant interest, see [39] and [34, 33, 35]. There are some fundamental open problems connected with $p=\infty$ , see [8, 2.6, 6.4, 6.5]. Interestingly, when choosing the third index $q$ small enough in Corollaries 22, 23 we get rid of the logarithm.

Acknowledgment.

The authors would like to thank Dinh Dũng, Thomas Kühn, Van Kien Nguyen, Winfried Sickel and Vladimir N. Temlyakov for several discussions on the topic. They would also like to thank two anonymous referees for their valuable comments. T.U. and S.M. would like to acknowledge support by the DFG Ul-403/2-1 and by the Fraunhofer Cluster of Excellence “Cognitive Internet Technologies”.

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Akashi. An operator theoretical characterization of ε 𝜀 \varepsilon -entropy in Gaussian processes. Kodai Mathem. Journ. , 9(1):58–67, 1986.
2[2] T. Aoki. Locally bounded linear topological spaces. Proceedings of the Imperial Academy , 18(10):588–594, 1942.
3[3] P. Billingsley. Ergodic theory and information . John Wiley & Sons, Inc., New York-London-Sydney, 1965.
4[4] B. Carl. Entropy numbers, s 𝑠 s -numbers, and eigenvalue problems. J. Funct. Anal. , 41(3):290–306, 1981.
5[5] B. Carl and I. Stephani. Entropy, compactness and the approximation of operators . Cambrige Univ. Press, Cambridge, 1990.
6[6] S. Dirksen and T. Ullrich. Gelfand numbers related to structured sparsity and Besov space embeddings with small mixed smoothness. J. Complexity , 48:69–102, 2018.
7[7] D. Dũng. Non-linear approximations using sets of finite cardinality or finite pseudo-dimension. volume 17, pages 467–492. 2001. 3rd Conference of the Foundations of Computational Mathematics (Oxford, 1999).
8[8] D. Dũng, V. N. Temlyakov, and T. Ullrich. Hyperbolic Cross Approximation . Advanced Courses in Mathematics. CRM Barcelona. Birkhäuser/Springer, 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Entropy numbers of finite dimensional mixed-norm balls and function space embeddings with small mixed smoothness

Abstract

1 Introduction

Outline.

Notation.

2 Entropy numbers and Schütt’s theorem

Theorem 1** (Schütt’s theorem).**

Remark 2**.**

Remark 3**.**

Remark 4**.**

3 Edmunds-Netrusov revisited

Theorem 5** (Theorems 3.1 and 3.2 in [10]).**

Remark 6**.**

3.1 A special case to begin with

Proposition 7**.**

Proof.

Remark 8**.**

3.2 The covering construction by Edmunds and Netrusov

Lemma 9** (Simplification of Lemma 2.2 in [10]).**

Proof.

Proposition 10** (Reformulation of Lemma 2.3 in [10]).**

Proof.

Proposition 11**.**

Proof sketch.

Remark 12**.**

4 Consequences of the Edmunds-Netrusov result

Theorem 13**.**

Proof.

Remark 14**.**

Proposition 15**.**

Proof.

Proposition 16**.**

Proof.

5 Polynomial decay of entropy numbers for

5.1 Function spaces of dominating mixed smoothness

5.2 Sequence spaces

5.3 Entropy numbers

Proposition 17**.**

Proof.

Proposition 18**.**

Proof.

Proposition 19**.**

Proof.

Remark 20**.**

Theorem 21**.**

Proof.

Corollary 22**.**

Proof.

Corollary 23**.**

Remark 24**.**

Acknowledgment.

Theorem 1 (Schütt’s theorem).

Remark 2.

Remark 3.

Remark 4.

Theorem 5 (Theorems 3.1 and 3.2 in [10]).

Remark 6.

Proposition 7.

Remark 8.

Lemma 9 (Simplification of Lemma 2.2 in [10]).

Proposition 10 (Reformulation of Lemma 2.3 in [10]).

Proposition 11.

Remark 12.

Theorem 13.

Remark 14.

Proposition 15.

Proposition 16.

Proposition 17.

Proposition 18.

Proposition 19.

Remark 20.

Theorem 21.

Corollary 22.

Corollary 23.

Remark 24.