The Rare Eclipse Problem on Tiles: Quantised Embeddings of Disjoint   Convex Sets

Valerio Cambareri; Chunlei Xu; Laurent Jacques

arXiv:1702.04664·cs.IT·February 16, 2017

The Rare Eclipse Problem on Tiles: Quantised Embeddings of Disjoint Convex Sets

Valerio Cambareri, Chunlei Xu, Laurent Jacques

PDF

Open Access 1 Repo

TL;DR

This paper investigates conditions under which quantised random embeddings preserve the separability of disjoint convex sets, enabling exact classification after dimensionality reduction, with theoretical results and numerical validation.

Contribution

It provides a new theoretical framework relating embedding dimension, quantiser resolution, and set separation for preserving separability in quantised embeddings.

Findings

01

Derived conditions linking embedding parameters and set separation.

02

Numerical phase transition curves for two -balls.

03

Experimental validation of theoretical results.

Abstract

Quantised random embeddings are an efficient dimensionality reduction technique which preserves the distances of low-complexity signals up to some controllable additive and multiplicative distortions. In this work, we instead focus on verifying when this technique preserves the separability of two disjoint closed convex sets, i.e., in a quantised view of the "rare eclipse problem" introduced by Bandeira et al. in 2014. This separability would ensure exact classification of signals in such sets from the signatures output by this non-linear dimensionality reduction. We here present a result relating the embedding's dimension, its quantiser resolution and the sets' separation, as well as some numerically testable conditions to illustrate it. Experimental evidence is then provided in the special case of two $ℓ_{2}$ -balls, tracing the phase transition curves that ensure these sets'…

Figures7

Click any figure to enlarge with its caption.

Equations68

y = A (x) : = Q_{δ} (Φ x + ξ),

y = A (x) : = Q_{δ} (Φ x + ξ),

p_{0} : = P [Φ C_{1} \cap Φ C_{2} = \emptyset] ⩾ 1 - η .

p_{0} : = P [Φ C_{1} \cap Φ C_{2} = \emptyset] ⩾ 1 - η .

p_{0}

p_{0}

w (C) : = E_{g} x \in C sup ∣ g^{⊤} x ∣, g \sim N^{n} (0, 1) .

w (C) : = E_{g} x \in C sup ∣ g^{⊤} x ∣, g \sim N^{n} (0, 1) .

p_{δ} : = P [A (C_{1}) \cap A (C_{2}) = \emptyset] ⩾ 1 - η .

p_{δ} : = P [A (C_{1}) \cap A (C_{2}) = \emptyset] ⩾ 1 - η .

p_{δ} = P [E] ⩾ 1 - η, i.e., P [E^{c}] ⩽ η,

p_{δ} = P [E] ⩾ 1 - η, i.e., P [E^{c}] ⩽ η,

E^{c} : = {\exists \leavevmode x_{1} \in C_{1}, x_{2} \in C_{2} : A (x_{1}) = A (x_{2})}

∣ \frac{1}{m} ∥ A (u_{1}) - A (u_{2}) ∥_{1} - c^{'} ∥ u_{1} - u_{2} ∥_{2} ∣ ⩽ ε ∥ u_{1} - u_{2} ∥_{2} + cδ ε^{'},

∣ \frac{1}{m} ∥ A (u_{1}) - A (u_{2}) ∥_{1} - c^{'} ∥ u_{1} - u_{2} ∥_{2} ∣ ⩽ ε ∥ u_{1} - u_{2} ∥_{2} + cδ ε^{'},

(1 - ϵ_{0}) ⩽ \frac{κ _{0}}{m} ∥ Φ u ∥_{1} ⩽ (1 + ϵ_{0}), \forall u \in S .

(1 - ϵ_{0}) ⩽ \frac{κ _{0}}{m} ∥ Φ u ∥_{1} ⩽ (1 + ϵ_{0}), \forall u \in S .

\textstyle\big{|}\kappa_{0}\mathcal{D}_{\ell_{1}}(\boldsymbol{\Phi}\boldsymbol{x}_{1}\!,\boldsymbol{\Phi}\boldsymbol{x}_{2})-\|\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\|_{2}\big{|}\ \leqslant\ \epsilon_{0}\|\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\|_{2},

\textstyle\big{|}\kappa_{0}\mathcal{D}_{\ell_{1}}(\boldsymbol{\Phi}\boldsymbol{x}_{1}\!,\boldsymbol{\Phi}\boldsymbol{x}_{2})-\|\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\|_{2}\big{|}\ \leqslant\ \epsilon_{0}\|\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\|_{2},

(1 - ϵ) ∥ u ∥_{2} ⩽ \frac{κ _{0}}{m} ∥ Φ u ∥_{1} ⩽ (1 + ϵ) ∥ u ∥_{2}, \forall u \in R^{n} .

(1 - ϵ) ∥ u ∥_{2} ⩽ \frac{κ _{0}}{m} ∥ Φ u ∥_{1} ⩽ (1 + ϵ) ∥ u ∥_{2}, \forall u \in R^{n} .

m ≳ ϵ^{- 2} H_{1} (E, \frac{m δ ϵ ^{2}}{1 + ϵ}),

m ≳ ϵ^{- 2} H_{1} (E, \frac{m δ ϵ ^{2}}{1 + ϵ}),

\textstyle\big{|}\mathcal{D}_{\ell_{1}}({\mathsf{A}}^{\prime}(\boldsymbol{a}),{\mathsf{A}}^{\prime}(\boldsymbol{b}))-\mathcal{D}_{\ell_{1}}(\boldsymbol{a},\boldsymbol{b})\big{|}\lesssim\delta\epsilon,\ \forall\boldsymbol{a},\boldsymbol{b}\in\mathcal{E}.

\textstyle\big{|}\mathcal{D}_{\ell_{1}}({\mathsf{A}}^{\prime}(\boldsymbol{a}),{\mathsf{A}}^{\prime}(\boldsymbol{b}))-\mathcal{D}_{\ell_{1}}(\boldsymbol{a},\boldsymbol{b})\big{|}\lesssim\delta\epsilon,\ \forall\boldsymbol{a},\boldsymbol{b}\in\mathcal{E}.

m ≳ (w_{\cap}^{2} + n \frac{δ ^{2}}{σ ^{2}}) (1 + lo g (1 + \frac{r m}{δ n}) + w_{\cap}^{- 2} lo g \frac{1}{η}),

m ≳ (w_{\cap}^{2} + n \frac{δ ^{2}}{σ ^{2}}) (1 + lo g (1 + \frac{r m}{δ n}) + w_{\cap}^{- 2} lo g \frac{1}{η}),

H_{1} (Φ C_{\cup}, ρ) ≲ n lo g (1 + \frac{2 r ( 1 + ϵ ) ^{2}}{δ ϵ ^{2}}) .

H_{1} (Φ C_{\cup}, ρ) ≲ n lo g (1 + \frac{2 r ( 1 + ϵ ) ^{2}}{δ ϵ ^{2}}) .

\textstyle m\gtrsim\epsilon^{-2}_{0}\big{(}w^{2}_{\cap}\,(1+\log(1+\frac{2r\,(1+\epsilon(\epsilon_{0})\,)^{2}}{\delta\,\epsilon(\epsilon_{0})^{2}}))+\log{\textstyle\frac{1}{\eta}}\big{)}.

\textstyle m\gtrsim\epsilon^{-2}_{0}\big{(}w^{2}_{\cap}\,(1+\log(1+\frac{2r\,(1+\epsilon(\epsilon_{0})\,)^{2}}{\delta\,\epsilon(\epsilon_{0})^{2}}))+\log{\textstyle\frac{1}{\eta}}\big{)}.

D_{ℓ_{1}} (A (u), A (v)) = D_{ℓ_{1}} (A^{'} (Φ u), A^{'} (Φ v))

D_{ℓ_{1}} (A (u), A (v)) = D_{ℓ_{1}} (A^{'} (Φ u), A^{'} (Φ v))

⩾ D_{ℓ_{1}} (Φ u, Φ v) - cδ ϵ ⩾ κ_{0}^{- 1} (1 - ϵ_{0}) ∥ u - v ∥_{2} - cδ ϵ

⩾ κ_{0}^{- 1} (1 - ϵ_{0}) σ - cδ ϵ = κ_{0}^{- 1} (1 - ϵ_{0}) σ - \frac{cδ n}{w _{\cap}} ϵ_{0} .

ϵ_{0}^{- 2} w_{\cap}^{2} = (w_{\cap} + 2 c κ_{0} n \frac{δ}{σ})^{2} ≲ w_{\cap}^{2} + n \frac{δ ^{2}}{σ ^{2}} .

ϵ_{0}^{- 2} w_{\cap}^{2} = (w_{\cap} + 2 c κ_{0} n \frac{δ}{σ})^{2} ≲ w_{\cap}^{2} + n \frac{δ ^{2}}{σ ^{2}} .

τ : = z \in C^{-} min ∥ Φ z ∥_{\infty},

τ : = z \in C^{-} min ∥ Φ z ∥_{\infty},

p_{δ} = P_{Φ, ξ} [E ∣ τ ⩽ δ] P_{Φ} [τ ⩽ δ] + \overset{p}{ˉ}_{δ} ⩾ \overset{p}{ˉ}_{δ},

p_{δ} = P_{Φ, ξ} [E ∣ τ ⩽ δ] P_{Φ} [τ ⩽ δ] + \overset{p}{ˉ}_{δ} ⩾ \overset{p}{ˉ}_{δ},

C^{(j)} : = {z \in C^{-} : ∣ φ_{j}^{⊤} z ∣ ⩾ ∣ φ_{i}^{⊤} z ∣, \forall i \neq = j \in [m]} \subset C^{-} .

C^{(j)} : = {z \in C^{-} : ∣ φ_{j}^{⊤} z ∣ ⩾ ∣ φ_{i}^{⊤} z ∣, \forall i \neq = j \in [m]} \subset C^{-} .

p_{δ} = P_{Φ, ξ} [\forall x_{1} \in C_{1}, x_{2} \in C_{2}, \exists i \in [m] : E_{i}]

p_{δ} = P_{Φ, ξ} [\forall x_{1} \in C_{1}, x_{2} \in C_{2}, \exists i \in [m] : E_{i}]

= P_{Φ, ξ} [\forall j \in [m], x_{1} \in C_{1}, x_{2} \in C_{2}, x_{1} - x_{2} \in C^{(j)}, \exists i \in [m] : E_{i}]

⩾ P_{Φ, ξ} [\forall j \in [m], x_{1} \in C_{1}, x_{2} \in C_{2}, x_{1} - x_{2} \in C^{(j)}, E_{j}]

= E_{Φ} P_{ξ} [\forall j \in [m], x_{1} \in C_{1}, x_{2} \in C_{2}, x_{1} - x_{2} \in C^{(j)}, E_{j} ∣ Φ]

P_{ξ} [\forall j \in [m], x_{1} \in C_{1}, x_{2} \in C_{2}, x_{1} - x_{2} \in C^{(j)}, E_{j} ∣ Φ]

P_{ξ} [\forall j \in [m], x_{1} \in C_{1}, x_{2} \in C_{2}, x_{1} - x_{2} \in C^{(j)}, E_{j} ∣ Φ]

= \prod_{j \in [m]} P_{ξ_{j}} [\forall x_{1} \in C_{1}, x_{2} \in C_{2} : x_{1} - x_{2} \in C^{(j)}, E_{j} ∣ Φ]

= \prod_{j \in [m]} min {1, \frac{τ _{j}}{δ}}

⟹ p_{δ} ⩾ E_{Φ} \prod_{j \in [m]} min {1, \frac{τ _{j}}{δ}} = : \overset{ˉ}{\overset{p}{ˉ}}_{δ},

d^{t} (a, b) : = δ \sum_{k \in Z} I_{S^{t}} (a - k δ, b - k δ) .

d^{t} (a, b) : = δ \sum_{k \in Z} I_{S^{t}} (a - k δ, b - k δ) .

D^{t} (a, b) ⩾ D^{t + \frac{ρP}{m}} (a_{0}, b_{0}) - 8 (\frac{δ}{P} + \frac{ρ}{m}),

D^{t} (a, b) ⩾ D^{t + \frac{ρP}{m}} (a_{0}, b_{0}) - 8 (\frac{δ}{P} + \frac{ρ}{m}),

D^{t} (a, b) ⩽ D^{t - \frac{ρP}{m}} (a_{0}, b_{0}) + 8 (\frac{δ}{P} + \frac{ρ}{m}) .

\textstyle\mathbb{P}\big{[}|\mathcal{D}^{t}(\boldsymbol{a}+\boldsymbol{\xi},\boldsymbol{b}+\boldsymbol{\xi})-\mathcal{D}_{\ell_{1}}(\boldsymbol{a},\boldsymbol{b})|>4|t|+\epsilon(\delta+|t|)\big{]}\lesssim e^{-c\epsilon^{2}m}.

\textstyle\mathbb{P}\big{[}|\mathcal{D}^{t}(\boldsymbol{a}+\boldsymbol{\xi},\boldsymbol{b}+\boldsymbol{\xi})-\mathcal{D}_{\ell_{1}}(\boldsymbol{a},\boldsymbol{b})|>4|t|+\epsilon(\delta+|t|)\big{]}\lesssim e^{-c\epsilon^{2}m}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VC86/MLSPbox
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Medical Image Segmentation Techniques

Full text

The Rare Eclipse Problem on Tiles: Quantised Embeddings of Disjoint Convex Sets

Valerio Cambareri, Chunlei Xu, Laurent Jacques

ISPGroup, ICTEAM/ELEN, Université catholique de Louvain, Louvain-la-Neuve, Belgium.

E-mail: {valerio.cambareri, chunlei.xu, laurent.jacques}@uclouvain.be. The authors are partly funded by the Belgian National Fund for Scientific Research (FNRS) under the M.I.S.-FNRS project AlterSense. All authors have equally contributed to the realisation of this paper.

Abstract

Quantised random embeddings are an efficient dimensionality reduction technique which preserves the distances of low-complexity signals up to some controllable additive and multiplicative distortions. In this work, we instead focus on verifying when this technique preserves the separability of two disjoint closed convex sets, i.e., in a quantised view of the “rare eclipse problem” introduced by Bandeira et al. in 2014. This separability would ensure exact classification of signals in such sets from the signatures output by this non-linear dimensionality reduction. We here present a result relating the embedding’s dimension, its quantiser resolution and the sets’ separation, as well as some numerically testable conditions to illustrate it. Experimental evidence is then provided in the special case of two $\ell_{2}$ -balls, tracing the phase transition curves that ensure these sets’ separability in the embedded domain.

Index Terms:

Random embeddings, dimensionality reduction, quantisation, compressive classification, phase transition.

I Introduction

Dimensionality reduction methods are a crucial part of very large-scale machine learning frameworks, as they are in charge of mapping (with negligible losses) the information contained in high-dimensional data to a low-dimensional domain, thus minimising the computational effort of learning tasks. We here focus on a class of non-linear, non-adaptive dimensionality reduction methods, i.e., quantised random embeddings, as obtained111Our notation conventions are reported at the end of this section. by applying to $\boldsymbol{x}\in\mathcal{K}$ (with $\mathcal{K}\subset\mathbb{R}^{n}$ any dataset)

[TABLE]

where $\boldsymbol{\Phi}\in\mathbb{R}^{m\times n}$ is a Gaussian random sensing matrix, i.e., $\boldsymbol{\Phi}\sim\mathcal{N}^{m\times n}(0,1)$ ; $\mathcal{Q}_{\delta}(\cdot)\coloneqq\delta\lfloor\frac{\cdot}{\delta}\rfloor$ is a uniform scalar quantiser of resolution $\delta>0$ (applied component-wise), yielding a signature $\boldsymbol{y} \in\delta\mathbb{Z}^{m}$ ; $\boldsymbol{\xi}\sim{\mathcal{U}}^{m}([0,{\delta}])$ is some dither drawn uniformly in $[0,\delta]^{m}$ , which is fundamental to stabilise the action of the quantiser [1, 2].

The non-linear map described by (1) produces compact signatures $\boldsymbol{y}$ , either in terms of dimension $m\ll n$ , or of bits per entry (controlled by $\delta$ ) even if $m>n$ [3]. Learning tasks such as classification may then run on $\boldsymbol{y}\in\mathsf{A}(\mathcal{K})$ rather than $\boldsymbol{x}\in\mathcal{K}$ at reduced storage, transmission, and computational costs, with accuracy depending on $m$ , $\delta$ . However, contrarily to other non-linear maps (e.g., [4, 5]), (1) retains quasi-isometry properties [2, 6] that grant, under some requirements on $m$ (i.e., sample complexity bounds), the recovery of $\boldsymbol{x}$ from $\boldsymbol{y}$ using appropriate algorithms [7].

In this contribution we aim to prove that generic learning tasks can run seamlessly on $\mathsf{A}(\mathcal{K})$ by the ability of (1) to preserve the separability of different classes in $\mathcal{K}$ . These classes are described by $Q$ disjoint closed convex sets, i.e., $\mathcal{C}_{i}\subset\mathcal{K},{i\in[Q]}$ so that $\forall i,j\in[Q],i\neq j,\mathcal{C}_{i}\cap\mathcal{C}_{j}=\emptyset$ . Hence, we inquire whether testing if our data $\boldsymbol{x}\in\mathcal{C}_{i}$ is equivalent to doing so given $\boldsymbol{y}$ in (1); for this to hold, it is necessary that the classes’ images $\mathsf{A}(\mathcal{C}_{i}),{i\in[Q]}$ are still separable, i.e., $\forall i,j\in[Q],i\neq j,\,\mathsf{A}(\mathcal{C}_{i})\cap \mathsf{A}(\mathcal{C}_{j})=\emptyset$ . If this is violated then no learning algorithm can perform exact classification, as the images would “eclipse” each other. This perspective builds upon that of Bandeira et al. [8], who introduced this rare eclipse problem for linear embeddings, as reviewed in Sec. II-A. Focusing on $Q=2$ classes, in Sec. II-B we define the quantised eclipse problem and present our main result, i.e., a sample complexity bound which states the conditions on $m$ , $\delta$ , $\boldsymbol{\Phi}$ , and $\mathcal{C}_{i},i\in[Q]$ under which the images $\mathsf{A}(\mathcal{C}_{i})$ are separable with high probability (w.h.p.). In Sec. II-C this is simplified by lower bounds to the latter probability, which have the advantage of being numerically testable for disjoint convex sets by solving convex optimisation problems. Among such sets, we detail the specific case of two high-dimensional $\ell_{2}$ -balls in Sec. II-D; this is explored numerically in Sec. III by computing phase transition curves on the above probability bound, indicating a regime with respect to (w.r.t.) $m,\delta$ for (1) in which the sets’ separability is preserved.

Notation: Given a random variable (r.v.) $X$ (e.g., normal $\mathcal{N}(0,1)$ or uniform $\mathcal{U}([0,\delta])$ r.v.’s), we write $\boldsymbol{U}\sim X^{d_{1}\times d_{2}}$ (e.g., $\mathcal{N}^{d_{1}\times d_{2}}(0,1)$ ) to denote the $d_{1}\times d_{2}$ matrix (or vector, if $d_{2}=1$ ) with independent and identically distributed (i.i.d.) entries $U_{ij}\sim_{\rm i.i.d.}X$ . Spheres and balls in $\ell_{p}(\mathbb{R}^{q})$ are denoted by $\mathbb{S}^{q-1}_{p}$ and $\mathbb{B}^{q}_{p}$ . For a set $\mathcal{C}\subset\mathbb{R}^{n}$ , its Chebyshev radius is $\operatorname{rad}(\mathcal{C})=\inf\{r>0:\exists\boldsymbol{c}\in\mathbb{R}^{n},\ \mathcal{C}\subset\boldsymbol{c}+r\mathbb{B}^{n}\}$ ; its image under a map $\mathsf{A}$ is $\mathsf{A}(\mathcal{C})$ ; its projection by a matrix $\boldsymbol{B}$ is $\boldsymbol{B}\mathcal{C}$ . The cardinality of a set $\mathcal{C}$ reads $|\mathcal{C}|$ , and $[Q]\coloneqq\{1,\ldots,Q\}$ . We denote by $C,c,c^{\prime},c^{\prime\prime}$ constants whose value can change between lines. We also write $f\lesssim g$ if $\exists c>0$ such that $f\leqslant c\,g$ , and correspondingly for $f\gtrsim g$ . Moreover, $f\simeq g$ means that $f\lesssim g$ and $g\lesssim f$ .

Relation to Prior Work: Many contributions have discussed linear dimensionality reduction by $\boldsymbol{y}=\boldsymbol{\Phi}\boldsymbol{x}$ with $\boldsymbol{\Phi}\sim X^{m\times n}$ a random matrix having i.i.d. entries distributed as a sub-Gaussian r.v. $X$ (for a survey, see [9]), i.e., random projections. Following the work of Johnson and Lindenstrauss [10], such linear embeddings were soon recognised [11, 12] as distance-preserving, non-adaptive222Not requiring any potentially large or unavailable training dataset, as opposed to, e.g., principal component analysis dimensionality reductions for finite datasets, i.e., with $|\mathcal{K}|<\infty$ . Moreover, several non-linear random embeddings are now available for more general models of $\mathcal{K}$ [5, 13, 14, 2, 15, 16]; most results on such embeddings rely on preserving distances, rather than the separation between classes within $\mathcal{K}$ . Regarding this last aspect, Dasgupta [17] first analysed the separability of a mixture-of-Gaussians dataset $\mathcal{K}$ after random projections. Later, with the rise of Compressed Sensing (CS), random projections followed by classification tasks were dubbed compressive classification. Davenport et al. [18] showed that if $\boldsymbol{\Phi}$ verifies the Restricted Isometry Property (RIP) w.r.t. a dataset $\mathcal{K}$ (i.e., a stable embedding) then exact classification can be achieved on $\boldsymbol{\Phi}\mathcal{K}$ thanks to distance preservation; $\mathcal{K}$ was therein taken as a finite set, or the set of sparse signals. Reboredo et al. [19, 20] studied the limits of compressive classification in a Bayesian framework. Finally, Bandeira et al. [8] first explored with the tools of high-dimensional geometry the conditions for the separability of closed convex sets $\mathcal{C}_{1},\mathcal{C}_{2}\subset\mathcal{K}\coloneqq\mathbb{R}^{n}$ after random projections. We here extend their approach to quantised random embeddings given by (1) which, due to their non-linearity, is a non-trivial endeavour that is currently lacking in the literature.

II Quantised Random Embeddings and

the Rare Eclipse Problem

II-A The Rare Eclipse Problem

Let us first recall the fundamental question introduced by Bandeira et al. [8] and their main result as follows.

Problem 1 (Rare Eclipse Problem (from [8])).

Let ${\mathcal{C}}_{1},{\mathcal{C}}_{2}\subset{\mathbb{R}}^{n}:$ ${\mathcal{C}}_{1}\cap{\mathcal{C}}_{2}=\emptyset$ be closed convex sets, $\boldsymbol{\Phi}\mathop{\sim}\mathcal{N}^{m\times n}(0,1)$ . Given $\eta\in(0,1)$ , find the smallest $m$ so that

[TABLE]

Prob. 1 is equivalent to ensuring, for all $\boldsymbol{x}_{1}\in\mathcal{C}_{1}$ , $\boldsymbol{x}_{2}\in\mathcal{C}_{2}$ , that $\boldsymbol{\Phi}\boldsymbol{x}_{1}\neq\boldsymbol{\Phi}\boldsymbol{x}_{2}$ with $\boldsymbol{\Phi}\sim\mathcal{N}^{m\times n}(0,1)$ . Let us define the difference set $\mathcal{C}^{-}\coloneqq\mathcal{C}_{1}-\mathcal{C}_{2}=\{\boldsymbol{z} \coloneqq\boldsymbol{x}_{1}-\boldsymbol{x}_{2}:\boldsymbol{x}_{1}\in\mathcal{C}_{1},\boldsymbol{x}_{2}\in\mathcal{C}_{2}\}$ . We can then cast (2) in terms of the kernel of $\boldsymbol{\Phi}$ , i.e.,

[TABLE]

Intuitively, $\eta$ in (2) will increase with the “size” of $\mathcal{C}^{-}$ , as its intersection with ${\rm Ker}(\boldsymbol{\Phi})$ will be non-empty. This size is here measured by the Gaussian mean width, i.e., for any set $\mathcal{C}$ ,

[TABLE]

Bandeira et al. then realised that (3) is found by Gordon’s Escape Theorem [21] since, by arbitrarily scaling $\mathcal{C}^{-}$ that amounts to taking the cone $\mathbb{R}_{+}\mathcal{C}^{-}$ , and by its intersection with the sphere $\mathbb{S}^{n-1}_{2}$ , we obtain a mesh (i.e., a closed subset of $\mathbb{S}^{n-1}_{2}$ ). Let us then define $\mathcal{S}\coloneqq(\mathbb{R}_{+}\mathcal{C}^{-})\cap\mathbb{S}^{n-1}_{2}$ of width $w_{\cap}\coloneqq w(\mathcal{S})$ , and report their main result (its proof is in [8]).

Proposition 1 (Corollary 3.1 in [8]).

In the setup of Prob. 1, given $\eta\in(0,1)$ , if $m\gtrsim(w_{\cap}+\sqrt{2\log\tfrac{1}{\eta}})^{2}+1$ then $p_{0}\geqslant 1-\eta$ .

Hence, the sample complexity of Prob. 1 is sharply characterised for any difference set whose $w_{\cap}$ is given or bounded.

II-B The Quantised Eclipse Problem

Extending Prop. 1 to quantised random embeddings as in (1) is not simple. To begin with, any two closed convex sets $\mathcal{C}_{1},\mathcal{C}_{2}$ would now be mapped into two countable sets $\mathsf{A}(\mathcal{C}_{1}),\mathsf{A}(\mathcal{C}_{2}) \subset\delta\mathbb{Z}^{m}$ ; verifying when they “collide” is our key question below.

Problem 2 (Quantised Eclipse Problem).

Let ${\mathcal{C}}_{1},{\mathcal{C}}_{2}\subset{\mathbb{R}}^{n}:$ ${\mathcal{C}}_{1}\cap{\mathcal{C}}_{2}=\emptyset$ be closed convex sets, and $\mathsf{A}$ defined in (1) with $\delta>0$ . Given $\eta\in(0,1)$ , find the smallest $m$ so that

[TABLE]

Note that, since $\mathsf{A}$ itself uses $\boldsymbol{\Phi}\mathop{\sim}\mathcal{N}^{m\times n}(0,1)$ before quantisation, $\boldsymbol{\Phi}\mathcal{C}_{1}\cap\boldsymbol{\Phi}\mathcal{C}_{2}\neq\emptyset\implies\mathsf{A}(\mathcal{C}_{1})\cap\mathsf{A}(\mathcal{C}_{2})\neq\emptyset$ ; hence, $p_{0}\geqslant p_{\delta}$ given the same $\boldsymbol{\Phi},\mathcal{C}_{1},\mathcal{C}_{2}$ . However, the converse does not hold since $\Phi\mathcal{C}_{1}\cap\Phi\mathcal{C}_{2} =\emptyset$ by itself does not suffice to ensure $\mathsf{A}(\mathcal{C}_{1})\cap\mathsf{A}(\mathcal{C}_{2})=\emptyset$ due to, e.g., coarse quantisation with large $\delta$ or some draws of $\boldsymbol{\xi}$ in (1). Then, letting the event $\mathsf{E}\coloneqq\{\forall\boldsymbol{x}_{1}\in\mathcal{C}_{1},\boldsymbol{x}_{2}\in\mathcal{C}_{2},\mathsf{A}(\boldsymbol{x}_{1})\neq\mathsf{A}(\boldsymbol{x}_{2})\}$ , we see (4) equals

[TABLE]

Hence, $\eta$ bounds the probability that any two $\boldsymbol{x}_{1}\in\mathcal{C}_{1}$ , $\boldsymbol{x}_{2}\in\mathcal{C}_{2}$ are consistent. Note that, by consistency [2], $\mathsf{A}(\boldsymbol{x}_{1})=\mathsf{A}(\boldsymbol{x}_{2})\implies\|\boldsymbol{\Phi}\boldsymbol{z}\|_{\infty}<\delta$ with $\boldsymbol{z}=\boldsymbol{x}_{1}- \boldsymbol{x}_{2}\in\mathcal{C}^{-}$ . Thus, introducing the separation $\sigma\coloneqq\min_{\boldsymbol{z}\in\mathcal{C}^{-}}\|\boldsymbol{z}\|_{2}$ , it is expected that $\eta$ will decay to [math] as $\sigma$ increases and $\delta$ decreases.

This is also sustained by the fact that $\mathsf{A}$ is known to respect w.h.p. the Quantised Restricted Isometry Property (QRIP) [6] over some $\mathcal{K}\subset\mathbb{R}^{n}$ provided $\boldsymbol{\Phi}$ satisfies a $(\ell_{1},\ell_{2})$ -form of the RIP (see Lemma 1) and $m$ is large before the dimension of $\mathcal{K}$ . If the QRIP holds, we would then have, for all $\boldsymbol{u}_{1},\boldsymbol{u}_{2}\in\mathcal{K}$ ,

[TABLE]

for some controllable distortions $\varepsilon,\varepsilon^{\prime}>0$ and constants $c,c^{\prime}>0$ . With $\mathcal{K}\coloneqq\mathcal{C}_{1}\cup\mathcal{C}_{2}$ , $\boldsymbol{x}_{1}\in\mathcal{C}_{1}$ and $\boldsymbol{x}_{2}\in\mathcal{C}_{2}$ , this ensures that $\frac{1}{m}\|\mathsf{A}(\boldsymbol{x}_{1})-\mathsf{A}(\boldsymbol{x}_{2})\|_{1}\geqslant(c^{\prime}-\varepsilon)\|\boldsymbol{z}\|_{2}-c\delta\varepsilon^{\prime}\geqslant(c^{\prime}-\varepsilon)\sigma-c\delta\varepsilon^{\prime}$ . Thus, $\mathsf{A}(\boldsymbol{x}_{1})\neq\mathsf{A}(\boldsymbol{x}_{2})$ simply follows if $\tfrac{\sigma}{\delta}>\tfrac{c\varepsilon^{\prime}}{c^{\prime}-\varepsilon}$ .

Before introducing our main result, let us present two lemmata, whose proof is given in the Appendix. The first assesses when $\boldsymbol{\Phi}\sim\mathcal{N}^{m\times n}(0,1)$ respects a $(\ell_{1},\ell_{2})$ -form of the RIP for a mesh (see, e.g., [15, Cor. 2.3],[22]).

Lemma 1.

Let $\epsilon_{0}>0$ and $\mathcal{S}\subset\mathbb{S}_{2}^{n-1}$ . If $m\gtrsim\epsilon_{0}^{-2}w^{2}(\mathcal{S})$ and $\boldsymbol{\Phi}\sim\mathcal{N}^{m\times n}(0,1)$ , then there exist some $C,c>0$ such that, with probability exceeding $1-C\exp(-c\epsilon_{0}^{2}m)$ and $\kappa_{0}=\sqrt{\scalebox{0.8}{$ \frac{\pi}{2} $}}$ ,

[TABLE]

Thus, provided $m\gtrsim\epsilon_{0}^{-2}w^{2}_{\cap}$ and defining $\mathcal{D}_{\ell_{1}}(\boldsymbol{a},\boldsymbol{b})\coloneqq{\textstyle\frac{1}{m}}\|\boldsymbol{a}-\boldsymbol{b}\|_{1}$ , applying Lemma 1 to $\mathcal{S}\coloneqq(\mathbb{R}_{+}\mathcal{C}^{-})\cap\mathbb{S}^{n-1}_{2}$ yields

[TABLE]

with the same probability and for all $\boldsymbol{x}_{1}\in\mathcal{C}_{1},\boldsymbol{x}_{2}\in\mathcal{C}_{2}$ , since $\tfrac{\boldsymbol{x}_{1}-\boldsymbol{x}_{2}}{\|\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\|_{2}}\in\mathcal{S}$ . Moreover, since $w^{2}(\mathbb{S}_{2}^{n-1})\lesssim n$ , provided $m\gtrsim\epsilon^{-2}n$ for some $\epsilon>0$ , we also have with probability exceeding $1-C\exp(-c\epsilon^{2}m)$ ,

[TABLE]

The second lemma proves that the mapping ${\mathsf{A}}^{\prime}(\cdot)\coloneqq\mathcal{Q}(\cdot+\boldsymbol{\xi})$ , with $\boldsymbol{\xi}\sim\mathcal{U}^{m}([0,\delta])$ , embeds333That is, in the Gromov-Hausdorff sense [15]. w.h.p. $\mathbb{R}^{m}$ in $\delta\mathbb{Z}^{m}$ in the metric $\mathcal{D}_{\ell_{1}}$ and up to some controlled distortions. This lemma uses the Kolmogorov entropy $\mathcal{H}_{q}(\mathcal{E},\rho)\coloneqq\log\mathcal{N}_{q}(\mathcal{E},\rho)$ of a bounded subset $\mathcal{E}\subset\mathbb{R}^{m}$ in the $\ell_{q}$ -metric ( $q\geqslant 1$ ) defined for $\rho>0$ , with $\mathcal{N}_{p}(\mathcal{E},\rho)$ the cardinality of its smallest $\rho$ -covering in the same metric.

Lemma 2.

Let $\mathcal{E}\subset\mathbb{R}^{m}$ be a bounded set. Given $\epsilon,\delta>0$ , if

[TABLE]

then, for $\boldsymbol{\xi}\sim\mathcal{U}^{m}([0,\delta])$ and with probability exceeding $1-C\exp(-cm\epsilon^{2})$ for some $C,c>0$ , we have

[TABLE]

We are finally able to state our main result, solving Prob. 2.

Proposition 2.

In the setup of Prob. 2, let $r_{i}\coloneqq\operatorname{rad}(\mathcal{C}_{i})$ , $i\in\{1,2\}$ , $r\coloneqq r_{1}+r_{2}$ , and ${\mathsf{A}}$ defined in (1) with $\delta>0$ . Given $\eta\in(0,1)$ , if

[TABLE]

then $p_{\delta}\geqslant 1-\eta$ .

Proof of Prop. 2.

Let us first observe when (10) holds with $\boldsymbol{\Phi}\sim\mathcal{N}^{m\times n}(0,1)$ , $\mathcal{E}\coloneqq\boldsymbol{\Phi}\mathcal{C}_{\cup}$ , $\mathcal{C}_{\cup}\coloneqq\mathcal{C}_{1}\cup\mathcal{C}_{2}\subset\mathbb{R}^{n}$ . This will be useful later to characterise when ${\mathsf{A}}(\mathcal{C}_{1})\cap{\mathsf{A}}(\mathcal{C}_{2})=\emptyset$ . Let $\mathcal{R}_{\cup}$ be a $\rho$ -covering in the $\ell_{2}$ -metric of $\mathcal{C}_{\cup}$ for some $\rho>0$ to be specified below. If $m\gtrsim\epsilon^{-2}n$ for some $\epsilon>0$ , we have from (9) that, with probability exceeding $1-C\exp(-c\epsilon^{2}m)$ , the event ${\mathsf{E}}_{0}$ where $\boldsymbol{\Phi}\mathcal{R}_{\cup}$ is a $\rho^{\prime}$ -covering of $\boldsymbol{\Phi}\mathcal{C}_{\cup}$ holds with $\rho^{\prime}=2m(1+\epsilon)\rho$ . This proves that, conditionally to ${\mathsf{E}_{0}}$ and for $\mathcal{E}=\boldsymbol{\Phi}\mathcal{C}_{\cup}$ , $\mathcal{H}_{1}(\mathcal{E},\rho^{\prime})\leqslant\mathcal{H}_{2}(\mathcal{C}_{\cup},\rho)$ . However, $\mathcal{H}_{2}(\mathcal{C}_{\cup},\rho)\leqslant\log 2+\max(\mathcal{H}_{2}(\mathcal{C}_{1},\rho),\mathcal{H}_{2}(\mathcal{C}_{2},\rho))\lesssim\max(\mathcal{H}_{2}(\mathcal{C}_{1},\rho),\mathcal{H}_{2}(\mathcal{C}_{2},\rho))$ . Moreover, we have $\mathcal{H}_{2}(\mathcal{C}_{i},\rho)\lesssim n\log(1+\frac{r_{i}}{\rho})$ [23], so that $\mathcal{H}_{2}(\mathcal{C}_{\cup},\rho)\lesssim n\log(1+\frac{r}{\rho})$ . Setting $\rho^{\prime}\coloneqq\tfrac{m\delta\epsilon^{2}}{1+\epsilon}$ gives $\rho=\tfrac{\delta\epsilon^{2}}{2(1+\epsilon)^{2}}$ and finally

[TABLE]

Consequently, conditionally to ${\mathsf{E}}_{0}$ which only depends on $\boldsymbol{\Phi}$ , Lemma 2 provides that if $m\gtrsim\epsilon^{-2}n\log(1+\frac{2r(1+\epsilon)^{2}}{\delta\epsilon^{2}})$ then, with probability exceeding $1-C\exp(-cm\epsilon^{2})$ , we get the occurrence of a new event, ${\mathsf{E}}^{\prime}_{0}$ , where (10) holds with $\boldsymbol{a}=\boldsymbol{\Phi}\boldsymbol{u}$ and $\boldsymbol{b}=\boldsymbol{\Phi}\boldsymbol{v}$ for all $\boldsymbol{u},\boldsymbol{v}\in\mathcal{C}_{\cup}$ . Under the same conditions, since $\mathbb{P}[{\mathsf{E}}^{\prime}_{0}]\geqslant\mathbb{P}[{\mathsf{E}}^{\prime}_{0}|{\mathsf{E}}_{0}]\mathbb{P}[{\mathsf{E}}_{0}]$ , ${\mathsf{E}}^{\prime}_{0}$ occurs unconditionally with $\mathbb{P}[{\mathsf{E}}^{\prime}_{0}]\geqslant 1-C^{\prime}\exp(-c^{\prime}m\epsilon^{2})$ , for some $C^{\prime},c^{\prime}>0$ .

Second, if $m\gtrsim\epsilon_{0}^{-2}w^{2}_{\cap}$ for some $\epsilon_{0}>0$ , Lemma 1 states that the event ${\mathsf{E}}_{1}$ , where (8) is respected for all $\boldsymbol{x}_{1}\in\mathcal{C}_{1}$ and all $\boldsymbol{x}_{2}\in\mathcal{C}_{2}$ , holds with probability exceeding $1-C\exp(-c\epsilon_{0}^{2}m)$ .

Given $\eta\!\in\!(0,1)$ and $\epsilon\!=\!\epsilon(\epsilon_{0})\!\coloneqq\!\frac{\sqrt{n}}{w_{\cap}}\!\epsilon_{0}$ , i.e., with $\epsilon\gtrsim\!\epsilon_{0}$ since $w_{\cap}\!\lesssim\!\sqrt{n}$ , the union bound yields that ${\mathsf{E}_{0}}$ and ${\mathsf{E}_{1}}$ jointly hold with probability exceeding $1-C\exp(-c\epsilon_{0}^{2}\!m)\!\geqslant\!1\!-\!\eta$ provided

[TABLE]

In this case, for all $\boldsymbol{u}\in\mathcal{C}_{1}$ and $\boldsymbol{v}\in\mathcal{C}_{2}$ (or vice versa), (10) (with $\boldsymbol{a}\coloneqq\boldsymbol{\Phi}\boldsymbol{u}$ and $\boldsymbol{b}\coloneqq\boldsymbol{\Phi}\boldsymbol{v}$ ) and (8) give, for some $c>0$ ,

[TABLE]

In order to have ${\mathsf{A}}(\mathcal{C}_{1})\cap{\mathsf{A}}(\mathcal{C}_{2})=\emptyset$ , the last quantity must be positive. Since $\epsilon_{0}>0$ , this clearly happens if $\kappa_{0}^{-1}\,(1-\epsilon_{0})\sigma-\frac{c\delta\sqrt{n}}{w_{\cap}}\epsilon_{0}=\frac{c\delta\sqrt{n}}{w_{\cap}}\epsilon_{0}$ , which gives

[TABLE]

Moreover, from the value of $\epsilon=\epsilon(\epsilon_{0})$ set above, $\frac{2r(1+\epsilon)^{2}}{\delta\epsilon^{2}}\leqslant 2\frac{r}{n\delta}(w^{2}_{\cap}+n\frac{\delta^{2}}{\sigma^{2}})$ , so that (12) is satisfied if (11) holds. This gives finally that $p_{\delta}\geqslant 1-\eta$ under this condition. ∎

Interestingly, up to diverging $\log$ factors (possibly due to proof artefacts), the requirement of Prop. 1 can be seen as a special case of (11) when $\delta\to 0^{+}$ , i.e., for a “vanishing” quantiser, since $(w_{\cap}+\sqrt{2\log{\textstyle\frac{1}{\eta}}})^{2}\lesssim w_{\cap}^{2}+\log{\textstyle\frac{1}{\eta}}$ . Finally, the application of Prop. 2 to more than two sets is possible, and will be included in an extended version of this paper.

II-C Testable Conditions by Convex Problems

To properly verify the bound on $p_{\delta}$ in Prop. 2 we should test the existence of any element in $\mathsf{A}(\mathcal{C}_{1})\cap\mathsf{A}(\mathcal{C}_{2})\subset\delta\mathbb{Z}^{m}$ , i.e., of any two consistent vectors $\boldsymbol{x}_{1}\in\mathcal{C}_{1},\boldsymbol{x}_{2}\in\mathcal{C}_{2}:\mathsf{A}(\boldsymbol{x}_{1})=\mathsf{A}(\boldsymbol{x}_{2})$ . As expected, this search is computationally intractable, so we now deduce numerically testable, albeit less tight lower bounds for $p_{\delta}$ . Let us first define the consistency margin $\tau$

[TABLE]

that is a function of $\boldsymbol{\Phi}$ and $\mathcal{C}^{-}$ , and can be related to the minimal separation $\sigma$ as defined above. Moreover, the event $\tau>\delta$ depends only on $\boldsymbol{\Phi}$ , so we can write (5) as

[TABLE]

where $\bar{p}_{\delta}\coloneqq\mathbb{P}_{\boldsymbol{\Phi}}[\tau >\delta]$ (i.e., $\mathbb{P}_{\boldsymbol{\Phi},\boldsymbol{\xi}}[\mathsf{E}|\tau>\delta]=1$ ) since $\|\boldsymbol{\Phi}(\boldsymbol{x}_{1}-\boldsymbol{x}_{2})\|_{\infty}\geqslant\tau>\delta\implies\mathsf{A}(\boldsymbol{x}_{1})\neq\mathsf{A}(\boldsymbol{x}_{2})$ , while the converse does not hold. Note that $\bar{p}_{\delta}$ fully accounts for the cases in which ${\rm Ker}(\boldsymbol{\Phi})\cap\mathcal{C}^{-}\neq\emptyset$ , since if $\boldsymbol{\Phi}\boldsymbol{z}\!=\!{\boldsymbol{0}}_{m}\!\implies\!\tau=0$ . Clearly, we can now estimate $\bar{p}_{\delta}$ as $\tau$ can be computed for each $\boldsymbol{\Phi}$ when the optimisation problem (13) is convex (i.e., iff $\mathcal{C}^{-}$ is, as for disjoint convex sets).

To tighten this bound and fully leverage dithering, we form a partition $\cup_{j}\mathcal{C}^{(j)}=\mathcal{C}^{-}$ formed by the cones

[TABLE]

We can now define for $j\in[m],\tau_{j}\coloneqq\min_{\boldsymbol{z}\in\mathcal{C}^{(j)}}|\boldsymbol{\varphi}^{\top}_{j}\boldsymbol{z}|$ , where clearly $\tau_{j}\geqslant\tau$ . Letting $\mathsf{A}_{i}(\boldsymbol{x})\coloneqq\mathcal{Q}_{\delta}(\boldsymbol{\varphi}_{i}^{\top}\boldsymbol{z}+\xi_{i})$ , we use a shorthand for the event $\mathsf{E}_{i}\coloneqq\{\mathsf{A}_{i}(\boldsymbol{x}_{1})\neq\mathsf{A}_{i}(\boldsymbol{x}_{2})\}$ and bound

[TABLE]

Then, since the entries $\xi_{j}$ of $\boldsymbol{\xi}\!\sim\!\mathcal{U}^{m}([0,\delta])$ are i.i.d.,

[TABLE]

where the second last line follows since $\mathsf{E}_{j}$ occurs whenever, given two intervals $\boldsymbol{\varphi}^{\top}_{j}\mathcal{C}_{1},\boldsymbol{\varphi}^{\top}_{j}\mathcal{C}_{2}\subset\mathbb{R}$ that are $\tau_{j}$ far apart, a quantiser threshold $\delta\mathbb{Z}+\xi_{j}$ falls between them. Hence, this event is identical to having $\mathbb{P}[\xi_{j}\in[0,{\tau_{j}}]]=\min\{1,\tau_{j}/\delta\}$ since $\xi_{j}\sim\mathcal{U}([0,\delta])$ . The computational complexity of estimating $\bar{\bar{p}}_{\delta}$ is similar to that of (14), while (15) is sharper, as it can be shown that $\bar{\bar{p}}_{\delta}\geqslant{\bar{p}}_{\delta}$ . However, we expect both bounds to be somewhat loose w.r.t. the one in Prop. 2.

II-D The Case of Two Disjoint $\ell_{2}$ -Balls

We now briefly focus on the case of two $\ell_{2}$ -balls $\mathcal{C}_{1}\coloneqq r_{1}\mathbb{B}^{n}_{2}+\boldsymbol{c}_{1}$ and $\mathcal{C}_{2}\coloneqq r_{2}\mathbb{B}^{n}_{2}+\boldsymbol{c}_{2}$ , for which $\mathcal{C}^{-}=r\mathbb{B}^{n}_{2}+\boldsymbol{c}$ with $\boldsymbol{c}\coloneqq\boldsymbol{c}_{1}-\boldsymbol{c}_{2}$ and $r\coloneqq r_{1}+r_{2}$ . It is then shown in [24, Prop. 4.3] that ${w_{\cap}}\lesssim\frac{r}{\|\boldsymbol{c}\|_{2}}\sqrt{n} \simeq\frac{\sqrt{n}}{\sigma}$ when $\sigma\gg r$ since $\sigma=\|\boldsymbol{c}\|_{2}-r$ . We can now compare the sample complexities in Prop. 1 and Prop. 2: up to some $\log$ and additive factors, we see that Prob. 2 has rate $\tfrac{m}{n}\gtrsim\frac{1}{\sigma^{2}}(1+\delta^{2})$ , while Prob. 1 only requires $\tfrac{m}{n} \gtrsim\frac{1}{\sigma^{2}}$ , hence showing the effect of $\delta$ that we will illustrate in our numerical experiments below.

III Numerical Experiments

We now test the special case of Sec. II-D by generating random instances of $\boldsymbol{\Phi}\sim\mathcal{N}^{m\times n}(0,1)$ and444By uniformity of ${\rm Ker}(\boldsymbol{\Phi})$ , $\boldsymbol{\Phi}\sim\mathcal{N}^{m\times n}(0,1)$ over the Grassmannian at the origin, it is legitimate to fix a randomly drawn direction $\tfrac{\boldsymbol{c}}{\|\boldsymbol{c}\|_{2}}$ for the simulations. $\mathcal{C}^{-}$ , and computing the quantities $\tau_{j},j\in[m]$ and $\tau$ for each instance, as specified in Sec. II-C. This allows us to empirically estimate $\bar{p}_{\delta},\bar{\bar{p}}_{\delta}$ respectively in (14), (15) on $128$ trials for each of the configurations $n=2^{6}$ and $m\in[2^{0},2^{6}]$ , and varying $\mathcal{C}^{-}$ by fixing $r=2$ and taking $\sigma=\|\boldsymbol{c}\|_{2}-r\in[2^{0},2^{9}]$ . The estimated values of $\bar{\bar{p}}_{\delta}$ are then reported as heat maps in Fig. 1a,b along with the phase transition curves $\bar{\bar{\rm p}}_{\delta}\coloneqq\{\bar{\bar{p}}_{\delta}\geqslant 0.9\}$ , ${\bar{\rm p}}_{\delta}\coloneqq\{{\bar{p}}_{\delta}\geqslant 0.9\}$ , and the linear case of Prop. 1 ${{\rm p}}_{0}\coloneqq\{{{p}}_{0}\geqslant 0.9\}$ , with $p_{0}$ being estimated as in [8]. Given $\tau_{j},j\in[m]$ for all instances, we compute in Fig. 1c the phase transition curves corresponding to $\bar{\bar{p}}_{\delta}$ for several $\delta=\{2^{0},2^{1},\ldots,2^{9}\}$ . For each curve, the event ${\mathsf{A}}(\mathcal{C}_{1})\cap{\mathsf{A}}(\mathcal{C}_{2})=\emptyset$ holds with probability at least $0.9$ . These curves are indeed compatible with the fact that $\log_{2}\tfrac{m}{n}\gtrsim\log_{2}\tfrac{1}{\sigma^{2}}+\log_{2}(1+\delta^{2})$ (up to $\log$ factors, and as concluded in Sec. II-D). However, we suspect that $\bar{\bar{p}}_{\delta}$ is still not sufficiently tight to approach our theoretical, albeit computationally intractable, bound on $p\delta$ , and leave this improvement to a future investigation.

IV Conclusion

The fundamental limits of learning tasks with embeddings are being tackled in several studies; our result illustrates the requirements for exact classification after quantised random embedding of two disjoint closed convex sets. As we only developed cases in which the datasets $\mathcal{K}$ are not specified as low complexity sets, we will discuss them in future works, e.g., for the case of $Q$ disjoint “clusters” of sparse signals $\mathcal{C}_{i},i\in[Q]$ .

V Appendix

Proof of Lemma 2.

We adapt the proof of [6, Prop. 1]. Given $\rho>0$ to be fixed later, let $\mathcal{E}_{\rho}$ be a $\rho$ -covering of $\mathcal{E}$ in the $\ell_{1}$ -metric, i.e., for all $\boldsymbol{a}\in\mathcal{E}$ there exists $\boldsymbol{a}_{0}\in\mathcal{E}_{\rho}$ such that $\|\boldsymbol{a}-\boldsymbol{a}_{0}\|_{1}\leqslant\rho$ . Notice that since $X(\boldsymbol{a},\boldsymbol{b})\coloneqq\mathcal{D}_{\ell_{1}}({\mathsf{A}}^{\prime}(\boldsymbol{a}),{\mathsf{A}}^{\prime}(\boldsymbol{b}))={\textstyle\frac{1}{m}}\,\sum_{i}|\mathcal{Q}(a_{i}+\xi_{i})-\mathcal{Q}(b_{i}+\xi_{i})|={\textstyle\frac{1}{m}}\sum_{i}X_{i}$ , with the i.i.d. sub-Gaussian r.v.’s $X_{i}$ such that $\mathbb{E}X_{i}=|a_{i}-b_{i}|$ [2, App. A], one can easily prove the concentration of $X(\boldsymbol{a},\boldsymbol{b})$ around $\mathbb{E}X(\boldsymbol{a},\boldsymbol{b})=\mathcal{D}_{\ell_{1}}(\boldsymbol{a},\boldsymbol{b})$ both on a fixed pair $\boldsymbol{a},\boldsymbol{b}\in\mathcal{E}$ and, by union bound, for all $\boldsymbol{a},\boldsymbol{b}\in\mathcal{E}_{\rho}$ since there are no more than $(e^{\mathcal{H}_{1}(\mathcal{E},\rho)})^{2}$ such pairs in $\mathcal{E}_{\rho}$ . Unfortunately, the discontinuity of the mapping ${\mathsf{A}}^{\prime}$ prevents us to directly extend this over the full set $\mathcal{E}$ by a continuity argument applied to each neighbourhood of the covering. However, this situation can be overcome by softening the pseudo-distance $d(\cdot,\cdot)\coloneqq|\mathcal{Q}(\cdot)-\mathcal{Q}(\cdot)|$ composing $X$ [2, 15]. We first note that $d(a,b)\coloneqq\delta\sum_{k\in\mathbb{Z}}\mathbb{I}_{\mathcal{S}}(a-k\delta,b-k\delta)$ , where $\mathcal{S}=\{(a,b)\in\mathbb{R}^{2}:ab<0\}$ and $\mathbb{I}_{\mathcal{C}}(a,b)$ is the indicator of $\mathcal{C}$ evaluated in $(a,b)$ , i.e., it is equal to $1$ if $(a,b)\in\mathcal{C}$ and [math] otherwise. In fact, $d(a,b)=\delta|(\delta\mathbb{Z})\cap[a,b]|$ , with $|\cdot|$ the cardinality operator, showing that $d/\delta$ counts the number of thresholds in $\delta\mathbb{Z}$ that can be inserted between $a$ and $b$ .

Introducing the set $\mathcal{S}^{t}=\{(a,b)\in\mathbb{R}^{2}:a<-t,b>t\}\cup\{(a,b)\in\mathbb{R}^{2}:a>t,b<-t\}$ for $t\in\mathbb{R}$ , with $\mathcal{S}^{0}=\mathcal{S}$ , we can define a soft version of $d$ by

[TABLE]

Thanks to $\mathcal{S}^{t}$ , the value of $t$ determines a set of forbidden (or relaxed) intervals $\delta\mathbb{Z}+[-|t|,|t|]=\{\,[k\delta-|t|,k\delta+|t|]:k\in\mathbb{Z}\}$ if $t>0$ (respectively $t<0$ ) of size $2|t|$ and centred on the quantiser thresholds in $\delta\mathbb{Z}$ . For $t>0$ a threshold of $\delta\mathbb{Z}$ is not counted in $d^{t}(a,b)$ if $a$ or $b$ fall in its forbidden interval, whereas for $t<0$ a threshold that is not between $a$ and $b$ can be counted if $a$ or $b$ fall inside its relaxed interval.

By extension, we can also define $\mathcal{D}^{t}(\boldsymbol{a},\boldsymbol{b})\coloneqq{\textstyle\frac{1}{m}}\sum_{i}d^{t}(a_{i},b_{i})$ for $\boldsymbol{a},\boldsymbol{b}\in\mathbb{R}^{m}$ , so that $\mathcal{D}^{0}(\boldsymbol{a},\boldsymbol{b})=\mathcal{D}_{\ell_{1}}\big{(}\mathcal{Q}(\boldsymbol{a}),\mathcal{Q}(\boldsymbol{b})\big{)}$ . Interestingly, this distance displays the following continuity property [2, Lemma 2]. For $\boldsymbol{a},\boldsymbol{b}\in\mathcal{E}$ , and $\boldsymbol{a}_{0},\boldsymbol{b}_{0}$ their respective closest points in $\mathcal{E}_{\rho}$ we have, for every $t\in\mathbb{R}$ and555In [2, Lemma 2], it is assumed $P\geqslant 1$ but nothing prevents $P>0$ . $P>0$ ,

[TABLE]

Moreover, for $\boldsymbol{\xi}\sim\mathcal{U}^{m}([0,\delta])$ and $\boldsymbol{a},\boldsymbol{b}$ fixed, $\mathcal{D}^{t}(\boldsymbol{a}+\boldsymbol{\xi},\boldsymbol{b}+\boldsymbol{\xi})$ concentrates around its mean which is close to $\mathcal{D}_{\ell_{1}}(\boldsymbol{a},\boldsymbol{b})$ [2, Lemma 3]. In fact, $\mathbb{|}\mathbb{E}\mathcal{D}^{t}(\boldsymbol{a}+\boldsymbol{\xi},\boldsymbol{b}+\boldsymbol{\xi})-\mathcal{D}_{\ell_{1}}(\boldsymbol{a},\boldsymbol{b})|\lesssim|t|$ , so that for some $c>0$ ,

[TABLE]

Therefore, by union bound and for some $P>0$ to be fixed soon, if $m\gtrsim\epsilon^{-2}\mathcal{H}_{1}(\mathcal{E},\rho)$ then

[TABLE]

with probability exceeding $1-Ce^{-c\epsilon^{2}m}$ for some $C,c>0$ .

Consequently, for any $\boldsymbol{a},\boldsymbol{b}\in\mathcal{E}$ and $\boldsymbol{a}_{0},\boldsymbol{b}_{0}$ their respective closest point in $\mathcal{E}_{\rho}$ , using (18) combined with (19), and since the triangular inequality provides $\mathcal{D}_{\ell_{1}}(\boldsymbol{a}_{0},\boldsymbol{b}_{0})\leqslant\mathcal{D}_{\ell_{1}}(\boldsymbol{a},\boldsymbol{b})+\frac{2\rho}{m}$ , we have with the same probability and for some $c>0$ ,

[TABLE]

where we finally set the free parameters as $P^{-1}=\epsilon$ and $\rho=m\delta\tfrac{\epsilon^{2}}{1+\epsilon}<m\delta\min(\epsilon,\epsilon^{2})$ , giving $\rho P\leqslant m\delta\epsilon$ and $\tfrac{\rho}{m}\leqslant\delta\epsilon$ . The lower bound is obtained similarly using (17) with the minus case of (19), and Prop. 2 is finally obtained with $t=0$ . ∎

inline]Remarks:

Note that in the small sigma regime, Prop. 1 diverge on the requirement on $m$ (except if we keep $\delta<Csigma$ ). I guess this make sense somehow and illustrate the special quantised geometry compared to linear REP that doesn’t display such a divergence.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. M. Gray and D. L. Neuhoff, “Quantization,” IEEE Transactions on Information Theory , vol. 44, no. 6, pp. 2325–2383, 1998.
2[2] L. Jacques, “Small width, low distortions: quasi-isometric embeddings with quantized sub-Gaussian random projections,” ar Xiv preprint ar Xiv:1504.06170 , 2015.
3[3] P. T. Boufounos, L. Jacques, F. Krahmer, and R. Saab, “Quantization and compressive sensing,” in Compressed Sensing and its Applications . Springer, 2015, pp. 193–237.
4[4] A. Rahimi and B. Recht, “Random Features for Large-Scale Kernel Machines,” in Advances in Neural Information Processing Systems 20 , J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds. Curran Associates, Inc., 2008, pp. 1177–1184.
5[5] P. T. Boufounos, S. Rane, and H. Mansour, “Representation and Coding of Signal Geometry,” ar Xiv preprint ar Xiv:1512.07636 , 2015.
6[6] L. Jacques and V. Cambareri, “Time for dithering: fast and quantized random embeddings via the restricted isometry property,” ar Xiv preprint ar Xiv:1607.00816 , 2016.
7[7] A. Moshtaghpour, L. Jacques, V. Cambareri, K. Degraux, and C. De Vleeschouwer, “Consistent Basis Pursuit for Signal and Matrix Estimates in Quantized Compressed Sensing,” IEEE Signal Processing Letters , vol. 23, no. 1, pp. 25–29, 2016.
8[8] A. S. Bandeira, D. G. Mixon, and B. Recht, “Compressive classification and the rare eclipse problem,” ar Xiv preprint ar Xiv:1404.3203 , 2014.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Taxonomy

The Rare Eclipse Problem on Tiles: Quantised Embeddings of Disjoint Convex Sets

Abstract

Index Terms:

I Introduction

II Quantised Random Embeddings and

II-A The Rare Eclipse Problem

Problem 1** (Rare Eclipse Problem (from [8])).**

Proposition 1** (Corollary 3.1 in [8]).**

II-B The Quantised Eclipse Problem

Problem 2** (Quantised Eclipse Problem).**

Lemma 1**.**

Lemma 2**.**

Proposition 2**.**

Proof of Prop. 2.

II-C Testable Conditions by Convex Problems

II-D The Case of Two Disjoint ℓ2\ell_{2}ℓ2​-Balls

III Numerical Experiments

IV Conclusion

V Appendix

Proof of Lemma 2.

Problem 1 (Rare Eclipse Problem (from [8])).

Proposition 1 (Corollary 3.1 in [8]).

Problem 2 (Quantised Eclipse Problem).

Lemma 1.

Lemma 2.

Proposition 2.

II-D The Case of Two Disjoint $\ell_{2}$ -Balls