The Rare Eclipse Problem on Tiles: Quantised Embeddings of Disjoint Convex Sets
Valerio Cambareri, Chunlei Xu, Laurent Jacques

TL;DR
This paper investigates conditions under which quantised random embeddings preserve the separability of disjoint convex sets, enabling exact classification after dimensionality reduction, with theoretical results and numerical validation.
Contribution
It provides a new theoretical framework relating embedding dimension, quantiser resolution, and set separation for preserving separability in quantised embeddings.
Findings
Derived conditions linking embedding parameters and set separation.
Numerical phase transition curves for two -balls.
Experimental validation of theoretical results.
Abstract
Quantised random embeddings are an efficient dimensionality reduction technique which preserves the distances of low-complexity signals up to some controllable additive and multiplicative distortions. In this work, we instead focus on verifying when this technique preserves the separability of two disjoint closed convex sets, i.e., in a quantised view of the "rare eclipse problem" introduced by Bandeira et al. in 2014. This separability would ensure exact classification of signals in such sets from the signatures output by this non-linear dimensionality reduction. We here present a result relating the embedding's dimension, its quantiser resolution and the sets' separation, as well as some numerically testable conditions to illustrate it. Experimental evidence is then provided in the special case of two -balls, tracing the phase transition curves that ensure these sets'…
Click any figure to enlarge with its caption.
Figure 1
Figure 1
Figure 1
Figure 2
Figure 2
Figure 3
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Medical Image Segmentation Techniques
The Rare Eclipse Problem on Tiles: Quantised Embeddings of Disjoint Convex Sets
Valerio Cambareri, Chunlei Xu, Laurent Jacques
ISPGroup, ICTEAM/ELEN, Université catholique de Louvain, Louvain-la-Neuve, Belgium.
E-mail: {valerio.cambareri, chunlei.xu, laurent.jacques}@uclouvain.be. The authors are partly funded by the Belgian National Fund for Scientific Research (FNRS) under the M.I.S.-FNRS project AlterSense. All authors have equally contributed to the realisation of this paper.
Abstract
Quantised random embeddings are an efficient dimensionality reduction technique which preserves the distances of low-complexity signals up to some controllable additive and multiplicative distortions. In this work, we instead focus on verifying when this technique preserves the separability of two disjoint closed convex sets, i.e., in a quantised view of the “rare eclipse problem” introduced by Bandeira et al. in 2014. This separability would ensure exact classification of signals in such sets from the signatures output by this non-linear dimensionality reduction. We here present a result relating the embedding’s dimension, its quantiser resolution and the sets’ separation, as well as some numerically testable conditions to illustrate it. Experimental evidence is then provided in the special case of two -balls, tracing the phase transition curves that ensure these sets’ separability in the embedded domain.
Index Terms:
Random embeddings, dimensionality reduction, quantisation, compressive classification, phase transition.
I Introduction
Dimensionality reduction methods are a crucial part of very large-scale machine learning frameworks, as they are in charge of mapping (with negligible losses) the information contained in high-dimensional data to a low-dimensional domain, thus minimising the computational effort of learning tasks. We here focus on a class of non-linear, non-adaptive dimensionality reduction methods, i.e., quantised random embeddings, as obtained111Our notation conventions are reported at the end of this section. by applying to (with any dataset)
[TABLE]
where is a Gaussian random sensing matrix, i.e., ; is a uniform scalar quantiser of resolution (applied component-wise), yielding a signature ; is some dither drawn uniformly in , which is fundamental to stabilise the action of the quantiser [1, 2].
The non-linear map described by (1) produces compact signatures , either in terms of dimension , or of bits per entry (controlled by ) even if [3]. Learning tasks such as classification may then run on rather than at reduced storage, transmission, and computational costs, with accuracy depending on , . However, contrarily to other non-linear maps (e.g., [4, 5]), (1) retains quasi-isometry properties [2, 6] that grant, under some requirements on (i.e., sample complexity bounds), the recovery of from using appropriate algorithms [7].
In this contribution we aim to prove that generic learning tasks can run seamlessly on by the ability of (1) to preserve the separability of different classes in . These classes are described by disjoint closed convex sets, i.e., so that . Hence, we inquire whether testing if our data is equivalent to doing so given in (1); for this to hold, it is necessary that the classes’ images are still separable, i.e., . If this is violated then no learning algorithm can perform exact classification, as the images would “eclipse” each other. This perspective builds upon that of Bandeira et al. [8], who introduced this rare eclipse problem for linear embeddings, as reviewed in Sec. II-A. Focusing on classes, in Sec. II-B we define the quantised eclipse problem and present our main result, i.e., a sample complexity bound which states the conditions on , , , and under which the images are separable with high probability (w.h.p.). In Sec. II-C this is simplified by lower bounds to the latter probability, which have the advantage of being numerically testable for disjoint convex sets by solving convex optimisation problems. Among such sets, we detail the specific case of two high-dimensional -balls in Sec. II-D; this is explored numerically in Sec. III by computing phase transition curves on the above probability bound, indicating a regime with respect to (w.r.t.) for (1) in which the sets’ separability is preserved.
Notation: Given a random variable (r.v.) (e.g., normal or uniform r.v.’s), we write (e.g., ) to denote the matrix (or vector, if ) with independent and identically distributed (i.i.d.) entries . Spheres and balls in are denoted by and . For a set , its Chebyshev radius is ; its image under a map is ; its projection by a matrix is . The cardinality of a set reads , and . We denote by constants whose value can change between lines. We also write if such that , and correspondingly for . Moreover, means that and .
Relation to Prior Work: Many contributions have discussed linear dimensionality reduction by with a random matrix having i.i.d. entries distributed as a sub-Gaussian r.v. (for a survey, see [9]), i.e., random projections. Following the work of Johnson and Lindenstrauss [10], such linear embeddings were soon recognised [11, 12] as distance-preserving, non-adaptive222Not requiring any potentially large or unavailable training dataset, as opposed to, e.g., principal component analysis dimensionality reductions for finite datasets, i.e., with . Moreover, several non-linear random embeddings are now available for more general models of [5, 13, 14, 2, 15, 16]; most results on such embeddings rely on preserving distances, rather than the separation between classes within . Regarding this last aspect, Dasgupta [17] first analysed the separability of a mixture-of-Gaussians dataset after random projections. Later, with the rise of Compressed Sensing (CS), random projections followed by classification tasks were dubbed compressive classification. Davenport et al. [18] showed that if verifies the Restricted Isometry Property (RIP) w.r.t. a dataset (i.e., a stable embedding) then exact classification can be achieved on thanks to distance preservation; was therein taken as a finite set, or the set of sparse signals. Reboredo et al. [19, 20] studied the limits of compressive classification in a Bayesian framework. Finally, Bandeira et al. [8] first explored with the tools of high-dimensional geometry the conditions for the separability of closed convex sets after random projections. We here extend their approach to quantised random embeddings given by (1) which, due to their non-linearity, is a non-trivial endeavour that is currently lacking in the literature.
II Quantised Random Embeddings and
the Rare Eclipse Problem
II-A The Rare Eclipse Problem
Let us first recall the fundamental question introduced by Bandeira et al. [8] and their main result as follows.
Problem 1** (Rare Eclipse Problem (from [8])).**
Let be closed convex sets, . Given , find the smallest so that
[TABLE]
Prob. 1 is equivalent to ensuring, for all , , that with . Let us define the difference set . We can then cast (2) in terms of the kernel of , i.e.,
[TABLE]
Intuitively, in (2) will increase with the “size” of , as its intersection with will be non-empty. This size is here measured by the Gaussian mean width, i.e., for any set ,
[TABLE]
Bandeira et al. then realised that (3) is found by Gordon’s Escape Theorem [21] since, by arbitrarily scaling that amounts to taking the cone , and by its intersection with the sphere , we obtain a mesh (i.e., a closed subset of ). Let us then define of width , and report their main result (its proof is in [8]).
Proposition 1** (Corollary 3.1 in [8]).**
In the setup of Prob. 1, given , if then .
Hence, the sample complexity of Prob. 1 is sharply characterised for any difference set whose is given or bounded.
II-B The Quantised Eclipse Problem
Extending Prop. 1 to quantised random embeddings as in (1) is not simple. To begin with, any two closed convex sets would now be mapped into two countable sets ; verifying when they “collide” is our key question below.
Problem 2** (Quantised Eclipse Problem).**
Let be closed convex sets, and defined in (1) with . Given , find the smallest so that
[TABLE]
Note that, since itself uses before quantisation, ; hence, given the same . However, the converse does not hold since by itself does not suffice to ensure due to, e.g., coarse quantisation with large or some draws of in (1). Then, letting the event , we see (4) equals
[TABLE]
Hence, bounds the probability that any two , are consistent. Note that, by consistency [2], with . Thus, introducing the separation , it is expected that will decay to [math] as increases and decreases.
This is also sustained by the fact that is known to respect w.h.p. the Quantised Restricted Isometry Property (QRIP) [6] over some provided satisfies a -form of the RIP (see Lemma 1) and is large before the dimension of . If the QRIP holds, we would then have, for all ,
[TABLE]
for some controllable distortions and constants . With , and , this ensures that . Thus, simply follows if .
Before introducing our main result, let us present two lemmata, whose proof is given in the Appendix. The first assesses when respects a -form of the RIP for a mesh (see, e.g., [15, Cor. 2.3],[22]).
Lemma 1**.**
Let and . If and , then there exist some such that, with probability exceeding and \kappa_{0}=\sqrt{\scalebox{0.8}{\frac{\pi}{2}}},
[TABLE]
Thus, provided and defining , applying Lemma 1 to yields
[TABLE]
with the same probability and for all , since . Moreover, since , provided for some , we also have with probability exceeding ,
[TABLE]
The second lemma proves that the mapping , with , embeds333That is, in the Gromov-Hausdorff sense [15]. w.h.p. in in the metric and up to some controlled distortions. This lemma uses the Kolmogorov entropy of a bounded subset in the -metric () defined for , with the cardinality of its smallest -covering in the same metric.
Lemma 2**.**
Let be a bounded set. Given , if
[TABLE]
then, for and with probability exceeding for some , we have
[TABLE]
We are finally able to state our main result, solving Prob. 2.
Proposition 2**.**
In the setup of Prob. 2, let , , , and defined in (1) with . Given , if
[TABLE]
then .
Proof of Prop. 2.
Let us first observe when (10) holds with , , . This will be useful later to characterise when . Let be a -covering in the -metric of for some to be specified below. If for some , we have from (9) that, with probability exceeding , the event where is a -covering of holds with . This proves that, conditionally to and for , . However, . Moreover, we have [23], so that . Setting gives and finally
[TABLE]
Consequently, conditionally to which only depends on , Lemma 2 provides that if then, with probability exceeding , we get the occurrence of a new event, , where (10) holds with and for all . Under the same conditions, since , occurs unconditionally with , for some .
Second, if for some , Lemma 1 states that the event , where (8) is respected for all and all , holds with probability exceeding .
Given and , i.e., with since , the union bound yields that and jointly hold with probability exceeding provided
[TABLE]
In this case, for all and (or vice versa), (10) (with and ) and (8) give, for some ,
[TABLE]
In order to have , the last quantity must be positive. Since , this clearly happens if , which gives
[TABLE]
Moreover, from the value of set above, , so that (12) is satisfied if (11) holds. This gives finally that under this condition. ∎
Interestingly, up to diverging factors (possibly due to proof artefacts), the requirement of Prop. 1 can be seen as a special case of (11) when , i.e., for a “vanishing” quantiser, since . Finally, the application of Prop. 2 to more than two sets is possible, and will be included in an extended version of this paper.
II-C Testable Conditions by Convex Problems
To properly verify the bound on in Prop. 2 we should test the existence of any element in , i.e., of any two consistent vectors . As expected, this search is computationally intractable, so we now deduce numerically testable, albeit less tight lower bounds for . Let us first define the consistency margin
[TABLE]
that is a function of and , and can be related to the minimal separation as defined above. Moreover, the event depends only on , so we can write (5) as
[TABLE]
where (i.e., ) since , while the converse does not hold. Note that fully accounts for the cases in which , since if . Clearly, we can now estimate as can be computed for each when the optimisation problem (13) is convex (i.e., iff is, as for disjoint convex sets).
To tighten this bound and fully leverage dithering, we form a partition formed by the cones
[TABLE]
We can now define for , where clearly . Letting , we use a shorthand for the event and bound
[TABLE]
Then, since the entries of are i.i.d.,
[TABLE]
where the second last line follows since occurs whenever, given two intervals that are far apart, a quantiser threshold falls between them. Hence, this event is identical to having since . The computational complexity of estimating is similar to that of (14), while (15) is sharper, as it can be shown that . However, we expect both bounds to be somewhat loose w.r.t. the one in Prop. 2.
II-D The Case of Two Disjoint -Balls
We now briefly focus on the case of two -balls and , for which with and . It is then shown in [24, Prop. 4.3] that when since . We can now compare the sample complexities in Prop. 1 and Prop. 2: up to some and additive factors, we see that Prob. 2 has rate , while Prob. 1 only requires , hence showing the effect of that we will illustrate in our numerical experiments below.
III Numerical Experiments
We now test the special case of Sec. II-D by generating random instances of and444By uniformity of , over the Grassmannian at the origin, it is legitimate to fix a randomly drawn direction for the simulations. , and computing the quantities and for each instance, as specified in Sec. II-C. This allows us to empirically estimate respectively in (14), (15) on trials for each of the configurations and , and varying by fixing and taking . The estimated values of are then reported as heat maps in Fig. 1a,b along with the phase transition curves , , and the linear case of Prop. 1 , with being estimated as in [8]. Given for all instances, we compute in Fig. 1c the phase transition curves corresponding to for several . For each curve, the event holds with probability at least . These curves are indeed compatible with the fact that (up to factors, and as concluded in Sec. II-D). However, we suspect that is still not sufficiently tight to approach our theoretical, albeit computationally intractable, bound on , and leave this improvement to a future investigation.
IV Conclusion
The fundamental limits of learning tasks with embeddings are being tackled in several studies; our result illustrates the requirements for exact classification after quantised random embedding of two disjoint closed convex sets. As we only developed cases in which the datasets are not specified as low complexity sets, we will discuss them in future works, e.g., for the case of disjoint “clusters” of sparse signals .
V Appendix
Proof of Lemma 2.
We adapt the proof of [6, Prop. 1]. Given to be fixed later, let be a -covering of in the -metric, i.e., for all there exists such that . Notice that since , with the i.i.d. sub-Gaussian r.v.’s such that [2, App. A], one can easily prove the concentration of around both on a fixed pair and, by union bound, for all since there are no more than such pairs in . Unfortunately, the discontinuity of the mapping prevents us to directly extend this over the full set by a continuity argument applied to each neighbourhood of the covering. However, this situation can be overcome by softening the pseudo-distance composing [2, 15]. We first note that , where and is the indicator of evaluated in , i.e., it is equal to if and [math] otherwise. In fact, , with the cardinality operator, showing that counts the number of thresholds in that can be inserted between and .
Introducing the set for , with , we can define a soft version of by
[TABLE]
Thanks to , the value of determines a set of forbidden (or relaxed) intervals if (respectively ) of size and centred on the quantiser thresholds in . For a threshold of is not counted in if or fall in its forbidden interval, whereas for a threshold that is not between and can be counted if or fall inside its relaxed interval.
By extension, we can also define for , so that \mathcal{D}^{0}(\boldsymbol{a},\boldsymbol{b})=\mathcal{D}_{\ell_{1}}\big{(}\mathcal{Q}(\boldsymbol{a}),\mathcal{Q}(\boldsymbol{b})\big{)}. Interestingly, this distance displays the following continuity property [2, Lemma 2]. For , and their respective closest points in we have, for every and555In [2, Lemma 2], it is assumed but nothing prevents . ,
[TABLE]
Moreover, for and fixed, concentrates around its mean which is close to [2, Lemma 3]. In fact, , so that for some ,
[TABLE]
Therefore, by union bound and for some to be fixed soon, if then
[TABLE]
with probability exceeding for some .
Consequently, for any and their respective closest point in , using (18) combined with (19), and since the triangular inequality provides , we have with the same probability and for some ,
[TABLE]
where we finally set the free parameters as and , giving and . The lower bound is obtained similarly using (17) with the minus case of (19), and Prop. 2 is finally obtained with . ∎
inline]Remarks:
- Note that in the small sigma regime, Prop. 1 diverge on the requirement on (except if we keep ). I guess this make sense somehow and illustrate the special quantised geometry compared to linear REP that doesn’t display such a divergence.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. M. Gray and D. L. Neuhoff, “Quantization,” IEEE Transactions on Information Theory , vol. 44, no. 6, pp. 2325–2383, 1998.
- 2[2] L. Jacques, “Small width, low distortions: quasi-isometric embeddings with quantized sub-Gaussian random projections,” ar Xiv preprint ar Xiv:1504.06170 , 2015.
- 3[3] P. T. Boufounos, L. Jacques, F. Krahmer, and R. Saab, “Quantization and compressive sensing,” in Compressed Sensing and its Applications . Springer, 2015, pp. 193–237.
- 4[4] A. Rahimi and B. Recht, “Random Features for Large-Scale Kernel Machines,” in Advances in Neural Information Processing Systems 20 , J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds. Curran Associates, Inc., 2008, pp. 1177–1184.
- 5[5] P. T. Boufounos, S. Rane, and H. Mansour, “Representation and Coding of Signal Geometry,” ar Xiv preprint ar Xiv:1512.07636 , 2015.
- 6[6] L. Jacques and V. Cambareri, “Time for dithering: fast and quantized random embeddings via the restricted isometry property,” ar Xiv preprint ar Xiv:1607.00816 , 2016.
- 7[7] A. Moshtaghpour, L. Jacques, V. Cambareri, K. Degraux, and C. De Vleeschouwer, “Consistent Basis Pursuit for Signal and Matrix Estimates in Quantized Compressed Sensing,” IEEE Signal Processing Letters , vol. 23, no. 1, pp. 25–29, 2016.
- 8[8] A. S. Bandeira, D. G. Mixon, and B. Recht, “Compressive classification and the rare eclipse problem,” ar Xiv preprint ar Xiv:1404.3203 , 2014.
