New Uniform Bounds for Almost Lossless Analog Compression
Yonatan Gutman, Adam \'Spiewak

TL;DR
This paper establishes uniform bounds on almost lossless analog compression rates for stationary processes within a set, linking these bounds to metric mean dimension and mean box dimension, and utilizing a variational principle for rate-distortion functions.
Contribution
It introduces uniform bounds for compression rates of stationary processes based on metric mean dimension and mean box dimension, extending prior theories.
Findings
Derived lower and upper bounds for compression rates
Connected metric mean dimension with rate-distortion functions
Applied variational principle to analyze compression limits
Abstract
Wu and Verd\'u developed a theory of almost lossless analog compression, where one imposes various regularity conditions on the compressor and the decompressor with the input signal being modelled by a (typically infinite-entropy) stationary stochastic process. In this work we consider all stationary stochastic processes with trajectories in a prescribed set of (bi)infinite sequences and find uniform lower and upper bounds for certain compression rates in terms of metric mean dimension and mean box dimension. An essential tool is the recent Lindenstrauss-Tsukamoto variational principle expressing metric mean dimension in terms of rate-distortion functions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
New Uniform Bounds for Almost Lossless Analog Compression
Yonatan Gutman1, Adam Śpiewak2 We are grateful to Amos Lapidoth, Neri Merhav and Erwin Riegler for helpful discussions. Y.G was partially supported by the National Science Center (Poland) Grant 2013/08/A/ST1/00275. Y.G and A.Ś were partially supported by the National Science Center (Poland) grant 2016/22/E/ST1/00448. 1 Institute of Mathematics, Polish Academy of Sciences, ul. Śniadeckich 8, 00-656 Warszawa, Poland
2 Institute of Mathematics, University of Warsaw, ul. Banacha 2, 02-097 Warszawa, Poland
Emails: [email protected], [email protected]
Abstract
Wu and Verdú developed a theory of almost lossless analog compression, where one imposes various regularity conditions on the compressor and the decompressor with the input signal being modelled by a (typically infinite-entropy) stationary stochastic process. In this work we consider all stationary stochastic processes with trajectories in a prescribed set of (bi)infinite sequences and find uniform lower and upper bounds for certain compression rates in terms of metric mean dimension and mean box dimension. An essential tool is the recent Lindenstrauss-Tsukamoto variational principle expressing metric mean dimension in terms of rate-distortion functions.
A full version of this paper is accessible as [1] (preprint).
I Introduction
In recent years, the theory of compression for analog sources (i.e. stochastic processes with values in ) underwent a major development (as a sample of such results see [2], [3], [4], [5]). There are two key differences with the classical Shannon’s model of compression for discrete sources. The first one is the necessity to employ regularity conditions on the compressor and/or decompressor functions (e.g. Lipschitz or Hölder continuity). This requirement makes the problem non-trivial and reasonable from the point of view of applications (as it induces robustness to noise). The second difference is the fact that non-discrete sources have in general infinite Shannon entropy rate, hence a different measure of complexity for stochastic processes has to be considered. One of the most fruitful approaches taken in the literature is to assume a specific structure of the source signal - as in compressed sensing, where the input vectors are assumed to be sparse (e.g. [6], [7]). In this setting, the theory of linear compression with efficient and stable recovery algorithms has been developed. However, strong assumptions posed on the structure of the signal reduce the applicability of the technique. A different approach was developed in the pioneering work [2]. Instead of making assumptions on the structure of the signal, new measures of complexity related to Minkowski (box-counting) dimension of the signal were introduced and proved to be bounds on compression rates for certain classes of compressors and decompressors. Similarly, Jalali and Poor ([3]) developed a theory of universal compressed sensing, where the linear compression rate is given in terms of a certain generalization of the Rényi information dimension for stochastic processes with the -mixing property.
The goal of this paper is twofold. We adapt the setting from [2], but instead of a single process we consider all stationary stochastic processes with trajectories in a prescribed set . This corresponds to an a priori knowledge of all the possible trajectories of the process rather than its distribution. We deal with the question of calculating minimal compression rates in the sense of [2] sufficient for all such stochastic processes with Borel or linear compressors and Hölder or Lipschitz decompressors. We depart from the precise setting of [2] in several directions. We consider processes with trajectories in , instead of together with compression and decompression both dependent on the distribution of the process and independent of it (but dependent on ). We also consider the case where the decompressor functions are -Hölder with fixed and for all block lengths. Our main results are upper and lower bounds for such rates in terms of certain geometric and dynamical characteristics of the considered set . This constitutes the second goal of the paper: we introduce notions from the theory of dynamical systems to the study of analog compression rates. As we consider stationary processes, it is natural to assume the set to be invariant under the shift transformation and hence it can be considered as a topological dynamical system. The obtained lower bounds are given in terms of the metric mean dimension of the system - a geometrical invariant of dynamical systems introduced and studied by Lindenstrauss and Weiss in [8]. Existence of connections between signal processing and mean dimension theory was observed first in [9], where the use of the Whittaker-Nyquist-Kotelnikov-Shannon sampling theorem was essential for proving the embedding conjecture of Lindenstrauss. Another connection between these domains was established recently in [10], where a variational principle for metric mean dimension was given in terms of rate-distortion functions. It is our main tool in developing lower bounds on compression rates for all stationary processes supported in . In the scenario where the compressor and decompressor functions are required to be independent of the distribution of the input process (only depending on ), we introduce mean box dimension of as the upper bound for corresponding compression rates.
II Preliminaries
In this paper, we apply results from the theory of dynamical systems to the theory of signal processing. In line with the signal processing perspective, we consider a stationary stochastic process defined on some probability space . Usually, instead of a single process, we are interested in considering all the stationary processes with trajectories in some prescribed set. A natural model for the set of possible trajectories is the notion of a subshift - a certain type of dynamical system. Introducing it allows us to consider stationary processes in terms of the theory of dynamical systems.
By a (topological) dynamical system we understand a triple , where is a compact metric space and is a homeomorphism. For a (countably-additive) Borel measure on , by we denote its transport by , i.e. a Borel measure on given by for Borel . We say that measure is -invariant, if . By we denote the set of all -invariant Borel probability measures on . We call a measure ergodic if every Borel set satisfying is of either full or zero measure . The set of all ergodic measures for a transformation is denoted by . For an introduction to topological dynamics and its connections with ergodic theory see [11, Chapters 5-8].
Consider the unit interval with the standard metric. By the Tychonoff’s theorem, is a compact metrizable space when endowed with the product topology. This topology is metrizable by the metric , where . This choice of the metric may seem arbitrary, but it turns out that the metric mean dimension for subshifts takes a natural form when calculated with respect to (see Proposition III.6). Define the shift transformation as We are interested in properties of a given subshift, i.e. a closed (in the product topology) and shift-invariant subset , which we interpret as the set of all admissible trajectories that can occur as input. Note that there is a one-to-one correspondence between measures and distributions of stationary processes such that belongs to with -probability one. Our goal is to relate compression properties of measures (stationary processes) from to the geometrical properties of the set . For define the projection as For vectors and , define the (normalized) ** distance** as \|x-y\|_{p}=\Big{(}\frac{1}{n}\sum_{k=0}^{n-1}|x_{k}-y_{k}|^{p}\Big{)}^{\frac{1}{p}} and
III Mean dimensions
In this section we will define metric mean dimension (for general dynamical systems) and (measurable) mean box dimension (for subshifts of ). These notions attempt to capture the average number of dimensions per iterate required to code orbits of the system. They serve as complexity measures employed to bound certain compression rates of subshifts in . Let us begin with the non-dynamical notion of box dimension.
Definition III.1**.**
Let be a compact metric space. For , the -covering number of a subset , denoted by , is the minimal cardinality of an open cover of by sets with diameter smaller than .
Definition III.2**.**
Let be a compact metric space. The upper box (Minkowski) dimension of is defined as
[TABLE]
In the sequel we consider only sets with distance induced by the norm . For more on box dimension see [12] and [13].
Definition III.3**.**
Let be a compact metric space and let be a homeomorphism. For define a metric on by . Set:
[TABLE]
(the limit exists due to the subadditivity of the function ).
Definition III.4**.**
The upper metric mean dimensions of the system is defined as
[TABLE]
Remark III.5*.*
It is easy to see that any system of finite topological entropy (see [11, Chapter 7]) satisfies . Metric mean dimension can be easily computed for full shifts: if is a compact metric space, then , where is the product metric (see [1]). Also, is an invariant for bi-Lipshitz isomorphisms: if and are dynamical systems and is bi-Lipshitz and equivariant (i.e. ), then .
A topological version of mean dimension for actions of amenable groups was introduced by Gromov in [14] and studied by Lindenstrauss and Weiss in their seminal work [8]. It turns out that the topological mean dimension is the right invariant to study for the problem of existence of an embedding into (see [9]). For more on mean topological dimension see [15]. The metric mean dimension was introduced in [8] and proved to be, when calculated with respect to any compatible metric, an upper bound for the topological mean dimension.
When is a subshift and (see Section II), metric mean dimension can be expressed in a more canonical form:
Proposition III.6**.**
For a subshift it holds
[TABLE]
Definition III.7**.**
For we define its upper mean box dimension as
[TABLE]
where is calculated with respect to norm on . The limit exists due to the subadditivity of the function .
Proposition III.8**.**
Let be a subshift. Then
[TABLE]
In [2], Wu and Verdú gave bounds on certain compression rates in terms of the following notion.
Definition III.9**.**
[2, Def. 10]) For a subshift , invariant measure , and define the measurable mean box dimension as
[TABLE]
Remark III.10*.*
Wu and Verdú use the name Minkowski-dimension compression rate for . As we reserve the term compression rate for a different concept (of an operational meaning, see Section IV-A), we decided to introduce a different name.
IV Analog compression
In this section we introduce analog compression rates for sources with alphabet and state our main results. In this setting it is natural to assume regularity constraints on the compressor and decompressor functions. This follows from the fact that we are taking an infinite alphabet under consideration: for every there exists a (Borel) bijection between and , hence the corresponding compression rates tend to zero if we do not assume any further regularity of the compressor and decompressor functions (cf. [2, Section IV.B]). On the other hand, from the point of view of applications it is desirable to impose some regularity conditions, as they induce robustness to noise and enable numerical control of the errors occurring in the compression and decompression processes.
IV-A Compression rates
Definition IV.1**.**
A regularity class is a set of functions between finite dimensional unit cubes, i.e. .
We will consider the following regularity classes: , , , where the Hölder condition is considered with respect to on and . Below we define several compression rates for various requirements on the performance of the compression and decompression process (see also [2, Def. 3]).
Definition IV.2**.**
Let be a subshift and . Let be regularity classes. For and , the almost lossless analog compression rate of with -block error probability is the infimum of , where runs over all natural numbers such that there exist maps and with
[TABLE]
Define further
We define similarly the uniform almost lossless analog compression rate of by requiring that (1) holds for all . In such a case, compression can be performed at asymptotic rate without knowing the distribution from which data comes, as long as the process is supported in .
For we define also the probability analog compression rate of with -block error probability at scale by replacing condition (1) with
[TABLE]
We define further and . We do not use directly in this paper, but it allows us to state results of [3] in the language of compression rates.
IV-B Previous results
Let us begin by presenting some known results giving bounds on compression rates introduced in the previous subsection. In their pioneering article [2] Wu and Verdú calculated and gave bounds on for certain and and fixed . For example by [2, Thm. 9] it follows for Bernoulli measure that for , where denotes the upper Rényi information dimension of a probability measure. Another of their results is the following:
Theorem IV.3**.**
[2, Thm. 18]** For and the following holds:
[TABLE]
and consequently .
Remark IV.4*.*
The above upper bound on comes from minimizing in [2, (172)] for fixed . Stronger result than the existence of linear compressor and Hölder decompressor was proven in [4, Section VIII], where it is shown that almost every linear transformation of rank large enough serves as a good compressor in this setting.
For the other direction, following closely the proof of the upper bound in [2, Equation (75)], we have the following proposition (see [1] for the proof).
Proposition IV.5**.**
Let be a subshift and . Then for and .
In applications the measure governing the source is not always known. Some universality in the compression process was proposed in [3]. In terms of compression rates, the following bound was obtained (for the definition of see [3, Def. 2] and for -mixing see [3, Def. 3]):
Theorem IV.6**.**
([3, Thms 7,8]) Let be -mixing. Then
[TABLE]
Remark IV.7*.*
[3] proved more than merely existence of suitable linear compressors. More precisely, they proved that for any , if is a -mixing stochastic process with distribution and are independent random matrices with entries drawn i.i.d according to and independently from with , then
[TABLE]
where is the distribution of and are some explicitly defined Borel functions (depending only on ). Hence, for such a random sequence of matrices, the expected value
[TABLE]
tends to zero as for any -mixing measure . Theorem IV.6 follows from this, since for any and large enough, there exists satisfying
[TABLE]
The decompressors take only finitely many values (hence are not continuous) and are defined via a certain minimization problem (which makes the decompression algorithm implementable, though not efficient (cf. [3, Remark 3])). The authors proved also that, in a certain setting, such a compression scheme is robust to noise (see [3, Thms 9 and 10]). The strength of the result is the universality of the compression scheme, which is designed without any prior knowledge of the distribution : a random Gaussian matrix will serve as a good compressor as long as the rate is at least . However, it does not follow that one can choose a sequence of matrices satisfying (3) for all -mixing measures with for some . Also, -mixing is quite a restrictive assumption.
IV-C Main results
Instead of assuming specific properties of the measure governing the source, we consider the scenario in which the set of all possibles trajectories is known. Therefore we are interested in the following question:
Main Question: Given a subshift , calculate
[TABLE]
for fixed regularity classes and .
We are interested in this question for and . Such or similar regularity conditions have appeared previously in the literature (e.g. Theorems IV.3 and IV.6). As above quantities are decreasing with , one can exchange for . Taking supremum over invariant measures in Theorem IV.3 and Proposition IV.5, we obtain:
Theorem IV.8**.**
Let be a subshift. The following holds for every :
[TABLE]
[TABLE]
Note that the above results do not give an explicit bound on the constant ; in fact, they do not guarantee a uniform bound for among the sequence of decoders. This is a drawback from the point of view of error control. Hence, it is reasonable to consider also class for fixed . Note that for any compression rate and class . In the sequel we give both lower and upper bounds for and in terms of and . Note that the quantities depending on the measure and parameter might be harder to calculate in specific examples than various geometric mean dimensions. Our main results are the following:
Theorem IV.9**.**
Let be a subshift. The following holds for every :
[TABLE]
For a sketch of the proof see Section VI. For details and extension to compression rates see [1]. In general, equality does not hold in Theorem IV.9. We also cannot change the class to , i.e. cannot serve as a lower bound in Theorem IV.8. See [1] for suitable examples.
Theorem IV.10**.**
Let be a subshift. Then, for every
[TABLE]
[TABLE]
The proof is based on the embedding theorem for with Hölder inverse [13, Thm. 4.3] (see [16] for an almost sure embedding theorem for Hausdorff dimension). See [1] for the proof and examples showing that one cannot change the constant to for and cannot be omitted.
V Rate-distortion functions and variational principles for metric mean dimension
Our proof of the lower bound in Theorem IV.9 is based on a variational principle for metric mean dimension in terms of rate-distortion function [10]. We work with a slight modification of the expression used in [10].
Definition V.1**.**
(compare with [10, p. 3-4]) Let be a compact metric space, let be a subshift and . For and we define the rate-distortion function as the infimum of , where and are random variables defined on some probability space such that
- •
takes values in , and its law is given by .
- •
takes values in and .
Here is the mutual information of random vectors and (see [17] and [10]). The function is subadditive (see [18, Thm. 9.6.1] for a proof in the finite alphabet case). Hence, we may define
[TABLE]
The following theorem is a variant of the variational principle for metric mean dimension in the case of subshifts. It is deduced from the original theorem [10, Theorem III.1]. We also prove that one can take the supremum over ergodic measures (see [1] for details).
Theorem V.2**.**
Let be a subshift. Then
[TABLE]
The above theorem remains true if we consider the distortion function instead of the variant (see [1]). As proved in [5, Thm. 1], for the rate-distortion function the above limit for fixed gives the upper information dimension of . For a variational principle for in terms of the mean Rényi information dimension see [1].
VI Lower bounds
The following inequality is the main ingredient of the proof of Theorem IV.9, as together with Theorem V.2 it yields the result. However, it is of independent interest, since it gives a lower bound for for fixed and .
Theorem VI.1**.**
Let be a subshift. The following holds for :
[TABLE]
Proof.
Fix . Assume that achieves almost lossless analog compression rate with error probability . One may find with and functions , such that , where . Regularly partition into cubes of side Borel-wise and let associate to each point the center of its cube. Note that and for all . Define by and by . This gives a pair of random vectors on the probability space . We now estimate (here and )
[TABLE]
[TABLE]
[TABLE]
This implies
[TABLE]
[TABLE]
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Y. Gutman and A. Śpiewak, “Metric mean dimension and analog compression,” Preprint. https://arxiv.org/abs/1812.00458, 2018.
- 2[2] Y. Wu and S. Verdú, “Rényi information dimension: fundamental limits of almost lossless analog compression,” IEEE Trans. Inform. Theory , vol. 56, no. 8, pp. 3721–3748, 2010.
- 3[3] S. Jalali and H. V. Poor, “Universal compressed sensing for almost lossless recovery,” IEEE Trans. Inform. Theory , vol. 63, no. 5, pp. 2933–2953, 2017.
- 4[4] D. Stotz, E. Riegler, E. Agustsson, and H. Bölcskei, “Almost lossless analog signal separation and probabilistic uncertainty relations,” IEEE Trans. Inform. Theory , vol. 63, no. 9, pp. 5445–5460, 2017.
- 5[5] B. C. Geiger and T. Koch, “On the information dimension rate of stochastic processes,” in 2017 IEEE International Symposium on Information Theory (ISIT) , June 2017, pp. 888–892.
- 6[6] E. J. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory , vol. 52, no. 2, pp. 489–509, 2006.
- 7[7] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory , vol. 52, no. 4, pp. 1289–1306, 2006.
- 8[8] E. Lindenstrauss and B. Weiss, “Mean topological dimension,” Israel J. Math. , vol. 115, pp. 1–24, 2000.
