New Uniform Bounds for Almost Lossless Analog Compression

Yonatan Gutman; Adam \'Spiewak

arXiv:1906.07620·math.DS·December 29, 2022

New Uniform Bounds for Almost Lossless Analog Compression

Yonatan Gutman, Adam \'Spiewak

PDF

TL;DR

This paper establishes uniform bounds on almost lossless analog compression rates for stationary processes within a set, linking these bounds to metric mean dimension and mean box dimension, and utilizing a variational principle for rate-distortion functions.

Contribution

It introduces uniform bounds for compression rates of stationary processes based on metric mean dimension and mean box dimension, extending prior theories.

Findings

01

Derived lower and upper bounds for compression rates

02

Connected metric mean dimension with rate-distortion functions

03

Applied variational principle to analyze compression limits

Abstract

Wu and Verd\'u developed a theory of almost lossless analog compression, where one imposes various regularity conditions on the compressor and the decompressor with the input signal being modelled by a (typically infinite-entropy) stationary stochastic process. In this work we consider all stationary stochastic processes with trajectories in a prescribed set $S \subset [0, 1]^{Z}$ of (bi)infinite sequences and find uniform lower and upper bounds for certain compression rates in terms of metric mean dimension and mean box dimension. An essential tool is the recent Lindenstrauss-Tsukamoto variational principle expressing metric mean dimension in terms of rate-distortion functions.

Equations56

\overline{dim}_{B} (A) = ε \to 0 lim sup \frac{lo g # ( A , ρ , ε )}{lo g \frac{1}{ε}} .

\overline{dim}_{B} (A) = ε \to 0 lim sup \frac{lo g # ( A , ρ , ε )}{lo g \frac{1}{ε}} .

S (X, T, ρ, ε) = n \to \infty lim \frac{lo g # ( X , ρ _{n} , ε )}{n}

S (X, T, ρ, ε) = n \to \infty lim \frac{lo g # ( X , ρ _{n} , ε )}{n}

\overline{mdim}_{M} (X, T, ρ) = ε \to 0 lim sup \frac{S ( X , T , ρ , ε )}{lo g \frac{1}{ε}} .

\overline{mdim}_{M} (X, T, ρ) = ε \to 0 lim sup \frac{S ( X , T , ρ , ε )}{lo g \frac{1}{ε}} .

\overline{mdim}_{M} (S, σ, τ) = ε \to 0 lim sup n \to \infty lim \frac{lo g # ( π _{n} ( S ) , ∣∣ \cdot ∣ ∣ _{\infty} , ε )}{n lo g \frac{1}{ε}} .

\overline{mdim}_{M} (S, σ, τ) = ε \to 0 lim sup n \to \infty lim \frac{lo g # ( π _{n} ( S ) , ∣∣ \cdot ∣ ∣ _{\infty} , ε )}{n lo g \frac{1}{ε}} .

\overline{mdim}_{B} (S) = n \to \infty lim \frac{dim _{B} ( π _{n} ( S ))}{n},

\overline{mdim}_{B} (S) = n \to \infty lim \frac{dim _{B} ( π _{n} ( S ))}{n},

\overline{mdim}_{M} (S, σ, τ) \leq \overline{mdim}_{B} (S) .

\overline{mdim}_{M} (S, σ, τ) \leq \overline{mdim}_{B} (S) .

R_{B}(\mu,\delta)=\limsup\limits_{n\to\infty}\ \inf\Big{\{}\frac{\overline{\dim}_{B}(A)}{n}:A\subset[0,1]^{n},\\ A\text{ - compact, }\mu(\pi_{n}^{-1}(A))\geq 1-\delta\Big{\}}.

R_{B}(\mu,\delta)=\limsup\limits_{n\to\infty}\ \inf\Big{\{}\frac{\overline{\dim}_{B}(A)}{n}:A\subset[0,1]^{n},\\ A\text{ - compact, }\mu(\pi_{n}^{-1}(A))\geq 1-\delta\Big{\}}.

μ ({x \in S ∣ g \circ f (x ∣_{0}^{n - 1}) \neq = x ∣_{0}^{n - 1}}) \leq ε .

μ ({x \in S ∣ g \circ f (x ∣_{0}^{n - 1}) \neq = x ∣_{0}^{n - 1}}) \leq ε .

μ ({x \in S : ∥ x ∣_{0}^{n - 1} - g \circ f (x ∣_{0}^{n - 1}) ∥_{p} \geq ε}) \leq δ .

μ ({x \in S : ∥ x ∣_{0}^{n - 1} - g \circ f (x ∣_{0}^{n - 1}) ∥_{p} \geq ε}) \leq δ .

r_{LIN - H_{α}} (μ, ε) \leq \frac{1}{1 - α} R_{B} (μ, ε)

r_{LIN - H_{α}} (μ, ε) \leq \frac{1}{1 - α} R_{B} (μ, ε)

ε > 0 sup r_{LIN - B}^{P, 2} (μ, ε) \leq \overline{d}_{0} (μ) .

ε > 0 sup r_{LIN - B}^{P, 2} (μ, ε) \leq \overline{d}_{0} (μ) .

∥ X ∣_{0}^{n - 1} - g_{n} \circ A_{n} (X ∣_{0}^{n - 1}) ∥_{2} ⟶ n \to \infty 0 in probability μ \otimes ν,

∥ X ∣_{0}^{n - 1} - g_{n} \circ A_{n} (X ∣_{0}^{n - 1}) ∥_{2} ⟶ n \to \infty 0 in probability μ \otimes ν,

E_{ν} μ ({x \in [0, 1]^{Z} : ∥ x ∣_{0}^{n - 1} - g_{n} \circ A_{n} (x ∣_{0}^{n - 1}) ∥_{2} \geq ε})

E_{ν} μ ({x \in [0, 1]^{Z} : ∥ x ∣_{0}^{n - 1} - g_{n} \circ A_{n} (x ∣_{0}^{n - 1}) ∥_{2} \geq ε})

μ ({x \in [0, 1]^{Z} : ∥ x ∣_{0}^{n - 1} - g_{n} \circ A_{n} (x ∣_{0}^{n - 1}) ∥_{2} \geq ε}) \leq δ .

μ ({x \in [0, 1]^{Z} : ∥ x ∣_{0}^{n - 1} - g_{n} \circ A_{n} (x ∣_{0}^{n - 1}) ∥_{2} \geq ε}) \leq δ .

ε > 0 sup μ \in P_{σ} (S) sup r_{C - D} (μ, ε) and ε > 0 sup r_{C - D} (S, ε)

ε > 0 sup μ \in P_{σ} (S) sup r_{C - D} (μ, ε) and ε > 0 sup r_{C - D} (S, ε)

α ε > 0 sup μ \in P_{σ} (S) sup R_{B} (μ, ε) \leq ε > 0 sup μ \in P_{σ} (S) sup r_{L I N - H_{α}} (μ, ε) \leq

α ε > 0 sup μ \in P_{σ} (S) sup R_{B} (μ, ε) \leq ε > 0 sup μ \in P_{σ} (S) sup r_{L I N - H_{α}} (μ, ε) \leq

\leq \frac{1}{1 - α} ε > 0 sup μ \in P_{σ} (S) sup R_{B} (μ, ε) .

\leq \frac{1}{1 - α} ε > 0 sup μ \in P_{σ} (S) sup R_{B} (μ, ε) .

α \overline{mdim}_{M} (S, σ, τ) \leq ε > 0 sup μ \in P_{σ} (S) sup r_{B - H_{L, α}} (μ, ε) .

α \overline{mdim}_{M} (S, σ, τ) \leq ε > 0 sup μ \in P_{σ} (S) sup r_{B - H_{L, α}} (μ, ε) .

ε > 0 sup μ \in P_{σ} (S) sup L > 0 in f r_{LIN - H_{L, α}} (μ, ε) \leq

ε > 0 sup μ \in P_{σ} (S) sup L > 0 in f r_{LIN - H_{L, α}} (μ, ε) \leq

\leq L > 0 in f r_{LIN - H_{L, α}} (S, 0) \leq min {1, \frac{2}{1 - α} \overline{mdim}_{B} (S)} .

\leq L > 0 in f r_{LIN - H_{L, α}} (S, 0) \leq min {1, \frac{2}{1 - α} \overline{mdim}_{B} (S)} .

\tilde{R}_{μ} (ε) = n \to \infty lim \tilde{R}_{μ} (n, ε) = n \in N in f \tilde{R}_{μ} (n, ε) .

\tilde{R}_{μ} (ε) = n \to \infty lim \tilde{R}_{μ} (n, ε) = n \in N in f \tilde{R}_{μ} (n, ε) .

\overline{mdim}_{M} (S, σ, τ) = ε \to 0 lim sup μ \in P_{σ} (S) sup \frac{R ~ _{μ} ( ε )}{lo g \frac{1}{ε}} = = ε \to 0 lim sup μ \in E_{σ} (S) sup \frac{R ~ _{μ} ( ε )}{lo g \frac{1}{ε}} .

\overline{mdim}_{M} (S, σ, τ) = ε \to 0 lim sup μ \in P_{σ} (S) sup \frac{R ~ _{μ} ( ε )}{lo g \frac{1}{ε}} = = ε \to 0 lim sup μ \in E_{σ} (S) sup \frac{R ~ _{μ} ( ε )}{lo g \frac{1}{ε}} .

\frac{R ~ _{μ} (( \frac{L}{2 ^{α}} + ε ^{(1 - α)} ) ε ^{α} )}{lo g (⌈ \frac{1}{ε} ⌉)} \leq r_{B - H_{L, α}} (μ, ε) .

\frac{R ~ _{μ} (( \frac{L}{2 ^{α}} + ε ^{(1 - α)} ) ε ^{α} )}{lo g (⌈ \frac{1}{ε} ⌉)} \leq r_{B - H_{L, α}} (μ, ε) .

E (\frac{1}{n} k = 0 \sum n - 1 d (X_{k}, Y_{k})) \leq [0, 1]^{n} \int ∥ x - g \circ f (x) ∥_{\infty} d (π_{n})_{*} μ (x) +

E (\frac{1}{n} k = 0 \sum n - 1 d (X_{k}, Y_{k})) \leq [0, 1]^{n} \int ∥ x - g \circ f (x) ∥_{\infty} d (π_{n})_{*} μ (x) +

+ [0, 1]^{n} \int ∥ g \circ f (x) - g \circ c \circ f (x) ∥_{\infty} d (π_{n})_{*} μ (x) \leq

+ [0, 1]^{n} \int ∥ g \circ f (x) - g \circ c \circ f (x) ∥_{\infty} d (π_{n})_{*} μ (x) \leq

\leq ε + [0, 1]^{n} \int L ∥ f (x) - c \circ f (x) ∥_{\infty}^{α} d (π_{n})_{*} μ (x) \leq ε + L \frac{ε ^{α}}{2 ^{α}} .

\leq ε + [0, 1]^{n} \int L ∥ f (x) - c \circ f (x) ∥_{\infty}^{α} d (π_{n})_{*} μ (x) \leq ε + L \frac{ε ^{α}}{2 ^{α}} .

\tilde{R}_{μ} ((\frac{L}{2 ^{α}} + ε^{1 - α}) ε^{α}) \leq \frac{1}{n} I (X; Y) \leq \frac{1}{n} H (Y) \leq

\tilde{R}_{μ} ((\frac{L}{2 ^{α}} + ε^{1 - α}) ε^{α}) \leq \frac{1}{n} I (X; Y) \leq \frac{1}{n} H (Y) \leq

\leq \frac{lo g (⌈ \frac{1}{ε} ⌉ ^{k} )}{n} = \frac{k lo g (⌈ \frac{1}{ε} ⌉)}{n} \leq lo g (⌈ \frac{1}{ε} ⌉) (r_{B - H_{L, α}} (μ, ε) + δ) .

\leq \frac{lo g (⌈ \frac{1}{ε} ⌉ ^{k} )}{n} = \frac{k lo g (⌈ \frac{1}{ε} ⌉)}{n} \leq lo g (⌈ \frac{1}{ε} ⌉) (r_{B - H_{L, α}} (μ, ε) + δ) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

New Uniform Bounds for Almost Lossless Analog Compression

Yonatan Gutman1, Adam Śpiewak2 We are grateful to Amos Lapidoth, Neri Merhav and Erwin Riegler for helpful discussions. Y.G was partially supported by the National Science Center (Poland) Grant 2013/08/A/ST1/00275. Y.G and A.Ś were partially supported by the National Science Center (Poland) grant 2016/22/E/ST1/00448. 1 Institute of Mathematics, Polish Academy of Sciences, ul. Śniadeckich 8, 00-656 Warszawa, Poland

2 Institute of Mathematics, University of Warsaw, ul. Banacha 2, 02-097 Warszawa, Poland

Emails: [email protected], [email protected]

Abstract

Wu and Verdú developed a theory of almost lossless analog compression, where one imposes various regularity conditions on the compressor and the decompressor with the input signal being modelled by a (typically infinite-entropy) stationary stochastic process. In this work we consider all stationary stochastic processes with trajectories in a prescribed set $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ of (bi)infinite sequences and find uniform lower and upper bounds for certain compression rates in terms of metric mean dimension and mean box dimension. An essential tool is the recent Lindenstrauss-Tsukamoto variational principle expressing metric mean dimension in terms of rate-distortion functions.

A full version of this paper is accessible as [1] (preprint).

I Introduction

In recent years, the theory of compression for analog sources (i.e. stochastic processes with values in $\mathbb{R}^{\mathbb{Z}}$ ) underwent a major development (as a sample of such results see [2], [3], [4], [5]). There are two key differences with the classical Shannon’s model of compression for discrete sources. The first one is the necessity to employ regularity conditions on the compressor and/or decompressor functions (e.g. Lipschitz or Hölder continuity). This requirement makes the problem non-trivial and reasonable from the point of view of applications (as it induces robustness to noise). The second difference is the fact that non-discrete sources have in general infinite Shannon entropy rate, hence a different measure of complexity for stochastic processes has to be considered. One of the most fruitful approaches taken in the literature is to assume a specific structure of the source signal - as in compressed sensing, where the input vectors are assumed to be sparse (e.g. [6], [7]). In this setting, the theory of linear compression with efficient and stable recovery algorithms has been developed. However, strong assumptions posed on the structure of the signal reduce the applicability of the technique. A different approach was developed in the pioneering work [2]. Instead of making assumptions on the structure of the signal, new measures of complexity related to Minkowski (box-counting) dimension of the signal were introduced and proved to be bounds on compression rates for certain classes of compressors and decompressors. Similarly, Jalali and Poor ([3]) developed a theory of universal compressed sensing, where the linear compression rate is given in terms of a certain generalization of the Rényi information dimension for stochastic processes with the $\psi^{*}$ -mixing property.

The goal of this paper is twofold. We adapt the setting from [2], but instead of a single process we consider all stationary stochastic processes with trajectories in a prescribed set $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ . This corresponds to an a priori knowledge of all the possible trajectories of the process rather than its distribution. We deal with the question of calculating minimal compression rates in the sense of [2] sufficient for all such stochastic processes with Borel or linear compressors and Hölder or Lipschitz decompressors. We depart from the precise setting of [2] in several directions. We consider processes with trajectories in $[0,1]^{\mathbb{Z}}$ , instead of $\mathbb{R}^{\mathbb{Z}}$ together with compression and decompression both dependent on the distribution of the process and independent of it (but dependent on $\mathcal{S}$ ). We also consider the case where the decompressor functions are $(L,\alpha)$ -Hölder with fixed $L>0$ and $\alpha\in(0,1]$ for all block lengths. Our main results are upper and lower bounds for such rates in terms of certain geometric and dynamical characteristics of the considered set $\mathcal{S}$ . This constitutes the second goal of the paper: we introduce notions from the theory of dynamical systems to the study of analog compression rates. As we consider stationary processes, it is natural to assume the set $\mathcal{S}$ to be invariant under the shift transformation and hence it can be considered as a topological dynamical system. The obtained lower bounds are given in terms of the metric mean dimension of the system $(\mathcal{S},\mathrm{shift})$ - a geometrical invariant of dynamical systems introduced and studied by Lindenstrauss and Weiss in [8]. Existence of connections between signal processing and mean dimension theory was observed first in [9], where the use of the Whittaker-Nyquist-Kotelnikov-Shannon sampling theorem was essential for proving the embedding conjecture of Lindenstrauss. Another connection between these domains was established recently in [10], where a variational principle for metric mean dimension was given in terms of rate-distortion functions. It is our main tool in developing lower bounds on compression rates for all stationary processes supported in $\mathcal{S}$ . In the scenario where the compressor and decompressor functions are required to be independent of the distribution of the input process (only depending on $\mathcal{S}$ ), we introduce mean box dimension of $\mathcal{S}$ as the upper bound for corresponding compression rates.

II Preliminaries

In this paper, we apply results from the theory of dynamical systems to the theory of signal processing. In line with the signal processing perspective, we consider a stationary stochastic process $\{X_{n}\}_{n\in\mathbb{Z}},\ X_{n}:\Omega\to[0,1]$ defined on some probability space $(\Omega,\mathbb{P})$ . Usually, instead of a single process, we are interested in considering all the stationary processes with trajectories in some prescribed set. A natural model for the set of possible trajectories is the notion of a subshift - a certain type of dynamical system. Introducing it allows us to consider stationary processes in terms of the theory of dynamical systems.

By a (topological) dynamical system we understand a triple $(\mathcal{X},T,\rho)$ , where $(\mathcal{X},\rho)$ is a compact metric space and $T:\mathcal{X}\to\mathcal{X}$ is a homeomorphism. For a (countably-additive) Borel measure $\mu$ on $\mathcal{X}$ , by $T_{*}\mu$ we denote its transport by $T$ , i.e. a Borel measure on $\mathcal{X}$ given by $T_{*}\mu(A)=\mu(T^{-1}(A))$ for Borel $A\subset\mathcal{X}$ . We say that measure $\mu$ is $T$ -invariant, if $\mu=T_{*}\mu$ . By $\mathcal{P}_{T}(\mathcal{X})$ we denote the set of all $T$ -invariant Borel probability measures on $\mathcal{X}$ . We call a measure $\mu\in\mathcal{P}_{T}(\mathcal{X})$ ergodic if every Borel set $A\subset\mathcal{X}$ satisfying $T^{-1}(A)=A$ is of either full or zero measure $\mu$ . The set of all ergodic measures for a transformation $T$ is denoted by $\mathcal{E}_{T}(\mathcal{X})$ . For an introduction to topological dynamics and its connections with ergodic theory see [11, Chapters 5-8].

Consider the unit interval $[0,1]$ with the standard metric. By the Tychonoff’s theorem, $[0,1]^{\mathbb{Z}}$ is a compact metrizable space when endowed with the product topology. This topology is metrizable by the metric $\tau(x,y)=\sum\limits_{i=-\infty}^{\infty}\frac{1}{2^{|i|}}|x_{i}-y_{i}|$ , where $x=(x_{i})_{i\in\mathbb{Z}},\ y=(y_{i})_{i\in\mathbb{Z}}$ . This choice of the metric may seem arbitrary, but it turns out that the metric mean dimension for subshifts takes a natural form when calculated with respect to $\tau$ (see Proposition III.6). Define the shift transformation $\sigma:[0,1]^{\mathbb{Z}}\to[0,1]^{\mathbb{Z}}$ as $\sigma((x_{i})_{i=-\infty}^{\infty})=(x_{i+1})_{i=-\infty}^{\infty}.$ We are interested in properties of a given subshift, i.e. a closed (in the product topology) and shift-invariant subset $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ , which we interpret as the set of all admissible trajectories that can occur as input. Note that there is a one-to-one correspondence between measures $\mu\in\mathcal{P}_{\sigma}(\mathcal{S})$ and distributions of stationary processes such that $(X_{n})_{n\in\mathbb{Z}}$ belongs to $\mathcal{S}$ with $\mu$ -probability one. Our goal is to relate compression properties of measures (stationary processes) from $\mathcal{P}_{\sigma}(\mathcal{S})$ to the geometrical properties of the set $\mathcal{S}$ . For $n\in\mathbb{N}$ define the projection $\pi_{n}:\mathcal{S}\to[0,1]^{n}$ as $\pi_{n}(x)=x|_{0}^{n-1}=(x_{0},x_{1},...,x_{n-1}).$ For vectors $x,y\in[0,1]^{n}$ and $p\in[1,\infty)$ , define the (normalized) $\ell^{p}$ ** distance** as $\|x-y\|_{p}=\Big{(}\frac{1}{n}\sum_{k=0}^{n-1}|x_{k}-y_{k}|^{p}\Big{)}^{\frac{1}{p}}$ and $\|x-y\|_{\infty}=\max\{|x_{k}-y_{k}|:1\leq k\leq n\}.$

III Mean dimensions

In this section we will define metric mean dimension (for general dynamical systems) and (measurable) mean box dimension (for subshifts of $[0,1]^{\mathbb{Z}}$ ). These notions attempt to capture the average number of dimensions per iterate required to code orbits of the system. They serve as complexity measures employed to bound certain compression rates of subshifts in $[0,1]^{\mathbb{Z}}$ . Let us begin with the non-dynamical notion of box dimension.

Definition III.1.

Let $(\mathcal{X},\rho)$ be a compact metric space. For $\varepsilon>0$ , the $\varepsilon$ -covering number of a subset $A\subset\mathcal{X}$ , denoted by $\#(A,\rho,\varepsilon)$ , is the minimal cardinality $N$ of an open cover $\{U_{1},\dots,U_{N}\}$ of $A$ by sets with diameter smaller than $\varepsilon$ .

Definition III.2.

Let $(\mathcal{X},\rho)$ be a compact metric space. The upper box (Minkowski) dimension of $A\subset\mathcal{X}$ is defined as

[TABLE]

In the sequel we consider only sets $A\subset[0,1]^{n}$ with distance induced by the norm $\|\cdot\|_{\infty}$ . For more on box dimension see [12] and [13].

Definition III.3.

Let $(\mathcal{X},\rho)$ be a compact metric space and let $T:\mathcal{X}\to\mathcal{X}$ be a homeomorphism. For $n\in\mathbb{N}$ define a metric $\rho_{n}$ on $\mathcal{X}$ by $\rho_{n}(x,y)=\max\limits_{0\leq k<n}\rho(T^{k}x,T^{k}y)$ . Set:

[TABLE]

(the limit exists due to the subadditivity of the function $n\mapsto\log\#(\mathcal{X},\rho_{n},\varepsilon)$ ).

Definition III.4.

The upper metric mean dimensions of the system $(\mathcal{X},T,\rho)$ is defined as

[TABLE]

*Remark III.5**.*

It is easy to see that any system of finite topological entropy (see [11, Chapter 7]) satisfies $\overline{\mathrm{mdim}}_{M}(\mathcal{X},T,\rho)=0$ . Metric mean dimension can be easily computed for full shifts: if $(A,d)$ is a compact metric space, then $\overline{\mathrm{mdim}}_{M}(A^{\mathbb{Z}},\sigma,\rho)=\overline{\mathrm{dim}}_{B}(A,d)$ , where $\rho$ is the product metric (see [1]). Also, $\overline{\mathrm{mdim}}_{M}$ is an invariant for bi-Lipshitz isomorphisms: if $(\mathcal{X},T,\rho_{1})$ and $(\mathcal{Y},S,\rho_{2})$ are dynamical systems and $\Phi:\mathcal{X}\to\mathcal{Y}$ is bi-Lipshitz and equivariant (i.e. $\Phi\circ T=S\circ\Phi$ ), then $\overline{\mathrm{mdim}}_{M}(\mathcal{X},T,\rho_{1})=\overline{\mathrm{mdim}}_{M}(\mathcal{Y},S,\rho_{2})$ .

A topological version of mean dimension for actions of amenable groups was introduced by Gromov in [14] and studied by Lindenstrauss and Weiss in their seminal work [8]. It turns out that the topological mean dimension is the right invariant to study for the problem of existence of an embedding into $(([0,1]^{D})^{\mathbb{Z}},\sigma)$ (see [9]). For more on mean topological dimension see [15]. The metric mean dimension was introduced in [8] and proved to be, when calculated with respect to any compatible metric, an upper bound for the topological mean dimension.

When $\mathcal{S}\subset[0,1]{}^{\mathbb{{Z}}}$ is a subshift and $\rho=\tau$ (see Section II), metric mean dimension can be expressed in a more canonical form:

Proposition III.6.

For a subshift $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ it holds

[TABLE]

Definition III.7.

For $\mathcal{S}\subset[0,1]{}^{\mathbb{{Z}}}$ we define its upper mean box dimension as

[TABLE]

where $\overline{\mathrm{dim}}_{B}(\pi_{n}(\mathcal{S}))$ is calculated with respect to $\|\cdot\|_{\infty}$ norm on $[0,1]^{n}$ . The limit exists due to the subadditivity of the function $n\mapsto\overline{\dim}_{B}(\pi_{n}(\mathcal{S}))$ .

Proposition III.8.

Let $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ be a subshift. Then

[TABLE]

In [2], Wu and Verdú gave bounds on certain compression rates in terms of the following notion.

Definition III.9.

[2, Def. 10]) For a subshift $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ , invariant measure $\mu\in\mathcal{P}_{\sigma}(\mathcal{S})$ , $n\in\mathbb{N}$ and $0\leq\delta<1$ define the measurable mean box dimension as

[TABLE]

*Remark III.10**.*

Wu and Verdú use the name Minkowski-dimension compression rate for $R_{B}(\mu,\delta)$ . As we reserve the term compression rate for a different concept (of an operational meaning, see Section IV-A), we decided to introduce a different name.

IV Analog compression

In this section we introduce analog compression rates for sources with alphabet $[0,1]$ and state our main results. In this setting it is natural to assume regularity constraints on the compressor and decompressor functions. This follows from the fact that we are taking an infinite alphabet under consideration: for every $n\in\mathbb{N}$ there exists a (Borel) bijection between $[0,1]^{n}$ and $[0,1]$ , hence the corresponding compression rates tend to zero if we do not assume any further regularity of the compressor and decompressor functions (cf. [2, Section IV.B]). On the other hand, from the point of view of applications it is desirable to impose some regularity conditions, as they induce robustness to noise and enable numerical control of the errors occurring in the compression and decompression processes.

IV-A Compression rates

Definition IV.1.

A regularity class is a set $\mathcal{C}$ of functions between finite dimensional unit cubes, i.e. $\mathcal{C}\subset\{f:[0,1]^{n}\to[0,1]^{k}\ |\ n,k\in\mathbb{N}\}$ .

We will consider the following regularity classes: $\mathcal{B}=\{\text{Borel maps}\}$ , $\mathcal{{H}}_{\alpha}=\{\alpha\text{-H\"{o}lder maps}\}$ , $\mathcal{{H}}_{L,\alpha}=\{\alpha\text{-H\"{o}lder maps with constant }L\}$ , $\mathrm{LIN}=\{\text{linear maps}\},$ where the Hölder condition is considered with respect to $\|\cdot\|_{\infty}$ on $[0,1]^{n}$ and $[0,1]^{k}$ . Below we define several compression rates for various requirements on the performance of the compression and decompression process (see also [2, Def. 3]).

Definition IV.2.

Let $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ be a subshift and $\mu\in\mathcal{P}_{\sigma}(\mathcal{S})$ . Let $\mathcal{{C}},\mathcal{{D}}\subset\{f:[0,1]^{n}\to[0,1]^{k}\ |\ n,k\in\mathbb{N}\}$ be regularity classes. For $n\in\mathbb{N}$ and $\varepsilon\geq 0$ , the $\mathcal{C}-\mathcal{D}$ almost lossless analog compression rate $\mathrm{r}_{\mathcal{C}-\mathcal{D}}(\mu,\varepsilon,n)\geq 0$ of $\mu$ with $n$ -block error probability $\varepsilon$ is the infimum of $\frac{k}{n}$ , where $k$ runs over all natural numbers such that there exist maps $f:[0,1]^{n}\rightarrow[0,1]^{k},\ f\in\mathcal{{C}}$ and $g:[0,1]^{k}\rightarrow[0,1]^{n},\ g\in\mathcal{{D}}$ with

[TABLE]

Define further $\mathrm{r}_{\mathcal{C}-\mathcal{D}}(\mu,\varepsilon)=\limsup\limits_{n\rightarrow\infty}\ \mathrm{r}_{\mathcal{C}-\mathcal{D}}(\mu,\varepsilon,n).$

We define similarly the $\mathcal{C}-\mathcal{D}$ uniform almost lossless analog compression rate $\mathrm{r}_{\mathcal{C}-\mathcal{D}}(\mathcal{S},\varepsilon)\geq 0$ of $\mathcal{S}$ by requiring that (1) holds for all $\mu\in\mathcal{P}_{\sigma}(\mathcal{S})$ . In such a case, compression can be performed at asymptotic rate $\mathrm{r}_{\mathcal{C}-\mathcal{D}}(\mathcal{S},\varepsilon)$ without knowing the distribution from which data comes, as long as the process is supported in $\mathcal{S}$ .

For $p\geq 1$ we define also the $\mathcal{C}-\mathcal{D}$ probability analog compression rate $\mathrm{r}_{\mathcal{C}-\mathcal{D}}^{P,p}(\mu,\varepsilon,n,\delta)\geq 0$ of $\mu$ with $n$ -block error probability $\delta\geq 0$ at scale $\varepsilon$ by replacing condition (1) with

[TABLE]

We define further $\mathrm{r}_{\mathcal{C}-\mathcal{D}}^{P,p}(\mu,\varepsilon,n)=\lim\limits_{\delta\to 0}\ \mathrm{r}_{\mathcal{C}-\mathcal{D}}^{P,p}(\mu,\varepsilon,n,\delta)$ and $\mathrm{r}_{\mathcal{C}-\mathcal{D}}^{P,p}(\mu,\varepsilon)=\limsup\limits_{n\rightarrow\infty}\ \mathrm{r}_{\mathcal{C}-\mathcal{D}}^{P,p}(\mu,\varepsilon,n)$ . We do not use $\mathrm{r}_{\mathcal{C}-\mathcal{D}}^{P,p}$ directly in this paper, but it allows us to state results of [3] in the language of compression rates.

IV-B Previous results

Let us begin by presenting some known results giving bounds on compression rates introduced in the previous subsection. In their pioneering article [2] Wu and Verdú calculated and gave bounds on $\mathrm{r}_{\mathcal{{C}}-\mathcal{{D}}}(\mu,\varepsilon)$ for certain $\mathcal{{C}}$ and $\mathcal{{D}}$ and fixed $\mu\in\mathcal{{P}}_{\sigma}(\mathbb{{R}^{\mathbb{{N}}}})$ . For example by [2, Thm. 9] it follows for Bernoulli measure $\mu=\bigotimes\limits_{\mathbb{{Z}}}\nu\in\mathcal{{P}}_{\sigma}([0,1]{}^{\mathbb{{Z}}})$ that $\mathrm{r}_{\mathcal{{B}}-\mathcal{H}_{1}}(\mu,\varepsilon)\geq\overline{\mathrm{ID}}(\nu)$ for $0<\varepsilon<1$ , where $\overline{\mathrm{ID}}$ denotes the upper Rényi information dimension of a probability measure. Another of their results is the following:

Theorem IV.3.

[2, Thm. 18]** For $\mu\in\mathcal{P}_{\sigma}([0,1]^{\mathbb{Z}})$ and $\alpha\in(0,1)$ the following holds:

[TABLE]

and consequently $\mathrm{r}_{\mathrm{LIN}-\mathcal{H}}(\mu,\varepsilon)\leq R_{B}(\mu,\varepsilon)$ .

*Remark IV.4**.*

The above upper bound on $\mathrm{r}_{\mathrm{LIN}-\mathcal{H}_{\alpha}}(\mu,\varepsilon)$ comes from minimizing $R$ in [2, (172)] for fixed $\beta$ . Stronger result than the existence of linear compressor and Hölder decompressor was proven in [4, Section VIII], where it is shown that almost every linear transformation of rank large enough serves as a good compressor in this setting.

For the other direction, following closely the proof of the upper bound in [2, Equation (75)], we have the following proposition (see [1] for the proof).

Proposition IV.5.

Let $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ be a subshift and $\mu\in\mathcal{P}_{\sigma}(\mathcal{S})$ . Then $\alpha R_{B}(\mu,\delta)\leq\mathrm{r}_{\mathcal{B}-\mathcal{H_{\alpha}}}(\mu,\delta)$ for $0<\delta<1$ and $\alpha\in(0,1]$ .

In applications the measure governing the source is not always known. Some universality in the compression process was proposed in [3]. In terms of compression rates, the following bound was obtained (for the definition of $\overline{d}_{0}(\mu)$ see [3, Def. 2] and for $\psi^{*}$ -mixing see [3, Def. 3]):

Theorem IV.6.

([3, Thms 7,8]) Let $\mu\in\mathcal{P}_{\sigma}([0,1]^{\mathbb{Z}})$ be $\psi^{*}$ -mixing. Then

[TABLE]

*Remark IV.7**.*

[3] proved more than merely existence of suitable linear compressors. More precisely, they proved that for any $\eta>0$ , if $(X_{n})_{n\in\mathbb{Z}}$ is a $\psi^{*}$ -mixing stochastic process with distribution $\mu$ and $A_{n}\in\mathbb{R}^{n\times m_{n}}$ are independent random matrices with entries drawn i.i.d according to $\mathcal{N}(0,1)$ and independently from $(X_{n})_{n\in\mathbb{Z}}$ with $\frac{m_{n}}{n}\geq(1+\eta)\overline{d}_{0}(\mu)$ , then

[TABLE]

where $\nu$ is the distribution of $(A_{n})_{n=1}^{\infty}$ and $g_{n}:\mathbb{R}^{m_{n}}\to\mathbb{R}^{n}$ are some explicitly defined Borel functions (depending only on $A_{n}$ ). Hence, for such a random sequence of matrices, the expected value

[TABLE]

tends to zero as $n\to\infty$ for any $\psi^{*}$ -mixing measure $\mu\in\mathcal{P}_{\sigma}([0,1]^{\mathbb{Z}})$ . Theorem IV.6 follows from this, since for any $\delta>0$ and $n$ large enough, there exists $A\in\mathbb{R}^{n\times m_{n}}$ satisfying

[TABLE]

The decompressors $g_{n}$ take only finitely many values (hence are not continuous) and are defined via a certain minimization problem (which makes the decompression algorithm implementable, though not efficient (cf. [3, Remark 3])). The authors proved also that, in a certain setting, such a compression scheme is robust to noise (see [3, Thms 9 and 10]). The strength of the result is the universality of the compression scheme, which is designed without any prior knowledge of the distribution $\mu$ : a random Gaussian matrix will serve as a good compressor as long as the rate is at least $\overline{d}_{0}(\mu)$ . However, it does not follow that one can choose a sequence of matrices $A_{n}$ satisfying (3) for all $\psi^{*}$ -mixing measures $\mu$ with $\overline{d}_{0}(\mu)\leq d$ for some $d\in[0,1]$ . Also, $\psi^{*}$ -mixing is quite a restrictive assumption.

IV-C Main results

Instead of assuming specific properties of the measure governing the source, we consider the scenario in which the set of all possibles trajectories is known. Therefore we are interested in the following question:

Main Question: Given a subshift $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ , calculate

[TABLE]

for fixed regularity classes $\mathcal{C}$ and $\mathcal{D}$ .

We are interested in this question for $\mathcal{C}\in\{\mathcal{B},\mathrm{LIN}\}$ and $\mathcal{D}=\mathcal{H}_{L,\alpha}$ . Such or similar regularity conditions have appeared previously in the literature (e.g. Theorems IV.3 and IV.6). As above quantities are decreasing with $\varepsilon$ , one can exchange $\sup\limits_{\varepsilon>0}$ for $\lim\limits_{\varepsilon\to 0}$ . Taking supremum over invariant measures in Theorem IV.3 and Proposition IV.5, we obtain:

Theorem IV.8.

Let $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ be a subshift. The following holds for every $0<\alpha<1$ :

[TABLE]

Note that the above results do not give an explicit bound on the constant $L$ ; in fact, they do not guarantee a uniform bound for $L$ among the sequence of decoders. This is a drawback from the point of view of error control. Hence, it is reasonable to consider also class $\mathcal{H}_{L,\alpha}$ for fixed $L,\alpha$ . Note that $r_{\mathcal{C}-\mathcal{H}_{\alpha}}(\mu,\varepsilon)\leq r_{\mathcal{C}-\mathcal{H}_{L,\alpha}}(\mu,\varepsilon)$ for any compression rate and class $\mathcal{C}$ . In the sequel we give both lower and upper bounds for $\mathrm{r}_{\mathcal{B}-\mathcal{H}_{L,\alpha}}(\mu,\varepsilon)$ and $\mathrm{r}_{\mathrm{LIN}-\mathcal{H}_{L,\alpha}}(\mu,\varepsilon)$ in terms of $\overline{\mathrm{mdim}}_{M}(\mathcal{S},\sigma,\tau)$ and $\overline{\mathrm{mdim}}_{B}(\mathcal{S})$ . Note that the quantities $R_{B}(\mu,\varepsilon)$ depending on the measure and parameter $\varepsilon$ might be harder to calculate in specific examples than various geometric mean dimensions. Our main results are the following:

Theorem IV.9.

Let $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ be a subshift. The following holds for every $0<\alpha\leq 1,L>0$ :

[TABLE]

For a sketch of the proof see Section VI. For details and extension to $L^{p}$ compression rates see [1]. In general, equality does not hold in Theorem IV.9. We also cannot change the class $\mathcal{H}_{L,\alpha}$ to $\mathcal{H}_{\alpha}$ , i.e. $\alpha\overline{\mathrm{mdim}}_{M}(\mathcal{S},\sigma,\tau)$ cannot serve as a lower bound in Theorem IV.8. See [1] for suitable examples.

Theorem IV.10.

Let $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ be a subshift. Then, for every $0<\alpha<1$

[TABLE]

The proof is based on the embedding theorem for $\overline{\mathrm{dim}}_{B}$ with Hölder inverse [13, Thm. 4.3] (see [16] for an almost sure embedding theorem for Hausdorff dimension). See [1] for the proof and examples showing that one cannot change the constant $\frac{2}{1-\alpha}$ to $\frac{t}{1-\alpha}$ for $t<2$ and $\inf\limits_{L>0}$ cannot be omitted.

V Rate-distortion functions and variational principles for metric mean dimension

Our proof of the lower bound in Theorem IV.9 is based on a variational principle for metric mean dimension in terms of rate-distortion function [10]. We work with a slight modification of the expression used in [10].

Definition V.1.

(compare with [10, p. 3-4]) Let $(A,d)$ be a compact metric space, let $\mathcal{S}\subset A^{\mathbb{Z}}$ be a subshift and $\mu\in\mathcal{P}_{\sigma}(\mathcal{S})$ . For $\varepsilon>0$ and $n\in\mathbb{{N}}$ we define the rate-distortion function $\tilde{R}_{\mu}(n,\varepsilon)$ as the infimum of $\frac{I(X;Y)}{n}$ , where $X=(X_{0},...,X_{n-1})$ and $Y=(Y_{0},\dots,Y_{n-1})$ are random variables defined on some probability space $(\Omega,\mathbb{P})$ such that

•

$X=(X_{0},...,X_{n-1})$ takes values in $A^{n}$ , and its law is given by $(\pi_{n})_{*}\mu$ .

•

$Y=(Y_{0},\dots,Y_{n-1})$ takes values in $A^{n}$ and $\mathbb{{E}}\left(\frac{1}{n}\sum_{k=0}^{n-1}d(X_{k},Y_{k})\right)\leq\varepsilon$ .

Here $I(X;Y)$ is the mutual information of random vectors $X$ and $Y$ (see [17] and [10]). The function $n\mapsto n\tilde{R}_{\mu}(n,\varepsilon)$ is subadditive (see [18, Thm. 9.6.1] for a proof in the finite alphabet case). Hence, we may define

[TABLE]

The following theorem is a variant of the variational principle for metric mean dimension in the case of subshifts. It is deduced from the original theorem [10, Theorem III.1]. We also prove that one can take the supremum over ergodic measures (see [1] for details).

Theorem V.2.

Let $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ be a subshift. Then

[TABLE]

The above theorem remains true if we consider the $L^{p}$ distortion function instead of the $L^{1}$ variant (see [1]). As proved in [5, Thm. 1], for the $L^{2}$ rate-distortion function the above limit for fixed $\mu\in\mathcal{P}_{\sigma}(\mathcal{S})$ gives the upper information dimension of $\mu$ . For a variational principle for $\overline{\mathrm{mdim}}_{M}$ in terms of the mean Rényi information dimension see [1].

VI Lower bounds

The following inequality is the main ingredient of the proof of Theorem IV.9, as together with Theorem V.2 it yields the result. However, it is of independent interest, since it gives a lower bound for $\mathrm{r}_{\mathcal{{B}}-\mathcal{{H}}_{L,\alpha}}(\mu,\varepsilon)$ for fixed $\mu$ and $\varepsilon$ .

Theorem VI.1.

Let $\mathcal{S}\subset[0,1]^{\mathbb{Z}}$ be a subshift. The following holds for $\mu\in\mathcal{P}_{\sigma}(\mathcal{S}),\ 0<\alpha\leq 1,\ L>0$ :

[TABLE]

Proof.

Fix $\delta,\varepsilon>0$ . Assume that $\mathcal{S}$ achieves $\mathcal{{B}}-\mathcal{H}_{L,\alpha}$ almost lossless analog compression rate $\mathrm{r}_{\mathcal{{B}}-\mathcal{H}_{L,\alpha}}(\mu,\varepsilon)<\infty$ with error probability $\varepsilon$ . One may find $k,n\in\mathbb{N}$ with $\frac{k}{n}\leq\mathrm{r}_{\mathcal{{B}}-\mathcal{{H}}_{L,\alpha}}(\mu,\varepsilon)+\delta$ and functions $f:[0,1]^{n}\rightarrow[0,1]^{k},\ f\in\mathcal{B}$ , $g:[0,1]^{k}\rightarrow[0,1]^{n},\ g\in\mathcal{H}_{L,\alpha}$ such that $\mu(E)\leq\varepsilon$ , where $E=\{x\in\mathcal{X}|\ g\circ f(x|_{0}^{n-1})\neq x|_{0}^{n-1}\}$ . Regularly partition $[0,1]^{k}$ into $\lceil\frac{1}{\varepsilon}\rceil^{k}$ cubes of side $\lceil\frac{1}{\varepsilon}\rceil^{-1}$ Borel-wise and let $c:[0,1]^{k}\rightarrow F$ associate to each point the center of its cube. Note that $|F|=\lceil\frac{1}{\varepsilon}\rceil^{k}$ and $||x-c(x)||_{\infty}\leq\frac{\varepsilon}{2}$ for all $x\in[0,1]^{k}$ . Define $Y:[0,1]^{n}\rightarrow[0,1]^{n}$ by $Y(p)=g(c(f(p)))$ and $X:[0,1]^{n}\to[0,1]^{n}$ by $X=\mathrm{id}$ . This gives a pair of random vectors on the probability space $([0,1]^{n},(\pi_{n})_{*}\mu)$ . We now estimate (here $A=[0,1]$ and $d=\|\cdot\|_{\infty}$ )

[TABLE]

This implies

[TABLE]

∎

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Y. Gutman and A. Śpiewak, “Metric mean dimension and analog compression,” Preprint. https://arxiv.org/abs/1812.00458, 2018.
2[2] Y. Wu and S. Verdú, “Rényi information dimension: fundamental limits of almost lossless analog compression,” IEEE Trans. Inform. Theory , vol. 56, no. 8, pp. 3721–3748, 2010.
3[3] S. Jalali and H. V. Poor, “Universal compressed sensing for almost lossless recovery,” IEEE Trans. Inform. Theory , vol. 63, no. 5, pp. 2933–2953, 2017.
4[4] D. Stotz, E. Riegler, E. Agustsson, and H. Bölcskei, “Almost lossless analog signal separation and probabilistic uncertainty relations,” IEEE Trans. Inform. Theory , vol. 63, no. 9, pp. 5445–5460, 2017.
5[5] B. C. Geiger and T. Koch, “On the information dimension rate of stochastic processes,” in 2017 IEEE International Symposium on Information Theory (ISIT) , June 2017, pp. 888–892.
6[6] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory , vol. 52, no. 2, pp. 489–509, 2006.
7[7] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory , vol. 52, no. 4, pp. 1289–1306, 2006.
8[8] E. Lindenstrauss and B. Weiss, “Mean topological dimension,” Israel J. Math. , vol. 115, pp. 1–24, 2000.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

New Uniform Bounds for Almost Lossless Analog Compression

Abstract

I Introduction

II Preliminaries

III Mean dimensions

Definition III.1**.**

Definition III.2**.**

Definition III.3**.**

Definition III.4**.**

Remark III.5*.*

Proposition III.6**.**

Definition III.7**.**

Proposition III.8**.**

Definition III.9**.**

Remark III.10*.*

IV Analog compression

IV-A Compression rates

Definition IV.1**.**

Definition IV.2**.**

IV-B Previous results

Theorem IV.3**.**

Remark IV.4*.*

Proposition IV.5**.**

Theorem IV.6**.**

Remark IV.7*.*

IV-C Main results

Theorem IV.8**.**

Theorem IV.9**.**

Theorem IV.10**.**

V Rate-distortion functions and variational principles for metric mean dimension

Definition V.1**.**

Theorem V.2**.**

VI Lower bounds

Theorem VI.1**.**

Proof.

Definition III.1.

Definition III.2.

Definition III.3.

Definition III.4.

*Remark III.5**.*

Proposition III.6.

Definition III.7.

Proposition III.8.

Definition III.9.

*Remark III.10**.*

Definition IV.1.

Definition IV.2.

Theorem IV.3.

*Remark IV.4**.*

Proposition IV.5.

Theorem IV.6.

*Remark IV.7**.*

Theorem IV.8.

Theorem IV.9.

Theorem IV.10.

Definition V.1.

Theorem V.2.

Theorem VI.1.