Phase Transition in the One-bit Johnson-Lindenstrauss Lemma
Amadou Bah, Bryson Kagy, Emily Smith

TL;DR
This paper demonstrates a phase transition phenomenon in one-bit Johnson-Lindenstrauss embeddings, showing that a small increase in dimension m sharply increases the probability of a successful embedding, with bounds similar to the linear case.
Contribution
It establishes a phase transition in the probability of one-bit JL embeddings being RIP, matching bounds known for linear JL, and analyzes this using properties of Bernoulli variables.
Findings
Phase transition in embedding success probability with respect to m.
Bounds on m similar to linear JL Lemma.
Probabilistic analysis using Bernoulli variables.
Abstract
The Johnson-Lindenstrauss Lemma (J-L Lemma) is a cornerstone of dimension reduction techniques. We study it in the one-bit context, namely we consider the unit sphere , with normalized geodesic metric, and map a finite set into the Hamming cube , with normalized Hamming metric. We find that for , and there is a -RIP from into . This is surprising as the value of is virtually identical to best known bound linear J-L Lemma. In both the linear and one-bit case, the maps are randomly constructed. We show that the probability of being a -RIP satisfies a phase transition. It passes from probability of nearly zero to nearly one with a very small change in . Our proof relies on delicate properties of Bernoulli…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Computational Geometry and Mesh Generation · Theoretical and Computational Physics
Phase Transition in the One-bit Johnson-Lindenstrauss Lemma
Amadou Bah
,
Bryson Kagy
and
Emily Smith
Abstract.
The Johnson-Lindenstrauss Lemma (J-L Lemma) is a cornerstone of dimension reduction techniques. We study it in the one-bit context, namely we consider the unit sphere , with normalized geodesic metric, and map a finite set into the Hamming cube , with normalized Hamming metric. We find that for , and there is a -RIP from into . This is surprising as the value of is virtually identical to best known bound linear J-L Lemma. In both the linear and one-bit case, the maps are randomly constructed. We show that the probability of being a -RIP satisfies a phase transition. It passes from probability of nearly zero to nearly one with a very small change in . Our proof relies on delicate properties of Bernoulli random variables.
Research conducted at during an REU sponsored by an NSF MCTP-Grant to the Georgia Institute of Technology
1. Introduction
Compressive sensing was first introduced as a practical application of signal processing and has since taken off and proven to be very useful for many aspects of modern life such as MRI scanning, cell phone imaging, electron microscopy, and many more [lustig2007sparse, fornasier2011compressive, binev2012compressed]. It has been previously shown by Johnson-Lindenstrauss [johnson1984extensions], that given a very high dimensional data set in , it is possible, with little sacrifice, to map vectors from a subset of this -dimensional space, to a much lower, -dimensional space. Recently, Alon and Klartag [2016arXiv161000239A] studied the minimum number of bits required in order to maintain the Euclidean distance between data points. This differs from our results through the fact that we maintain the geodesic distance between points. The non-linear geodesic metric is basic to our considerations.
Dasgupta-Gupta [dasgupta1999elementary] provide the best quantitative bounds in the J-L Lemma. For any , any integer let
[TABLE]
then for any set of points in there exist a map such that for all we have:
[TABLE]
For comparison below, we remark that .
We study the one bit context. Consider the unit sphere with the normalized geodesic metric. We map finite into the dimensional Hamming cube , with normalized Hamming metric. A main result is that for , and integer , let . For any set of cardinality , there is a -RIP from into . The counter-intuitive fact is that our bound for is *virtually identical to the one that holds for the linear J-L Lemma. *
We prove the One Bit J-L Lemma in §5. The simplier property of our random one-bit map being one to one is studied in §3. For special choices of , we make a finer analysis of the one-to-one and RIP properties. They satisfy phase transitions that depend only weakly on the number of points we are mapping, see §6 and §4. Some background information is recalled in §2.
2. Background
We formalize below several definitions we will use throughout the paper.
Hamming Cube
for all where . For all we have the normalized metric
[TABLE]
Random -dimensional One-Bit Map
Given be iid uniformly distributed random vectors in . Define a map by . Observe that
[TABLE]
Geodesic Distance
Fix , on . The geodesic distance is the shortest distance between the points and on the surface. This is given by
[TABLE]
Antipodal points are normalized to be distance one apart. Geodesic distance has this probabilistic interpretation: Let These are the which distinguish between and under the one-bit map. Selecting at random, the probability of being in is This is an instance of the Crofton formula.
For the distance in (2.0.1), we then have
[TABLE]
The right hand side is an average of Bernoulli rvs. In particular, the difference between the Hamming and geodesic metrics is
[TABLE]
Standard deviation inequalities for Bernoulli rvs apply to the right hand side above.
The Restricted Isometry Property
has the -RIP if for all pairs :
[TABLE]
Positively Associated Stein-Chen Approximation
For random variables to be positively associated, their covariance is positive, meaning they increase or decrease together.
[TABLE]
where is a sum of positively associated Bernoullis with parameter , is , and is total variation distance.
General Form of Stein-Chen Approximation
[arratia1990poisson]
[TABLE]
where are Bernoullis with parameter is a sum of all , , is the set of random variables that depend on , and is total variation distance.
3. A One-to-One Mapping From the Unit Sphere to the Hamming Cube
We start with an analysis of a simpler property of being one-to-one.
Theorem 3.1**.**
Let , , and let be a subset of n points with , where The random -dimensional one-bit map : will be one-to-one with probability at least provided that
[TABLE]
In the special case when the points and are pairwise orthogonal, ,
[TABLE]
By the pigeonhole principle, m must be at least Our result shows that if then the random -dimensional one-bit map is one-to-one with high probability.
Proof.
By the union bound, we know that:
[TABLE]
In this expression, means the sum over all unordered pairs where . Above, there are pairs The ith coordinates of and are equal with probability at most . The coordinates are independent, hence the inequality above. We require , which is true if
[TABLE]
This condition is sufficient for to be one-to-one with probability . In the special case when consists of pairwise orthogonal vectors, , the bound is
[TABLE]
∎
4. A Phase Transition in One-to-One Property
For a special class of X, we analyze the property of being one-to-one. We show that the probability passes through a phase transition. And the width of the phase transition is essentially independent of the cardinality of X.
Theorem 4.1**.**
Fix . Let be pairwise orthogonal vectors in , and let be the probability that is one-to-one. Then for , when:
[TABLE]
and when
[TABLE]
Additionally, the phase transition is bounded as follows:
[TABLE]
We will analyze this from the perspective of the birthday problem. To do this, we will count all pairs of points that map to the same point in the Hamming cube. Namely for , let
[TABLE]
All are i.i.d with probability and is a sum of positively associated Bernoulli random variables. By the Stein-Chen approximation, is close to a Poisson distribution, in total variation, denoted below. We make this precise below:
[TABLE]
where
[TABLE]
Lemma 4.2**.**
We claim
[TABLE]
Proof.
If the the have the property then the are pairwise independent. Assuming this, variance adds, and can be calculated:
[TABLE]
It remains to prove that . It will be sufficient to show that . The only non-trivial case is when and share exactly one point. We will write this as where y is the shared point. is equal to which means we have three distinct points on the sphere mapping to the same point on the Hamming cube. Thus giving us that .
∎
Lemma 4.3**.**
We claim that where is
Using this, we can bound in the window
[TABLE]
Proof.
We can find an expression for using the variance of :
[TABLE]
We can bound which is equal to :
[TABLE]
[TABLE]
∎
Solving For m. Fix , let X be pairwise orthogonal vectors in , and let be the probability that is one-to-one, then when:
[TABLE]
and when:
[TABLE]
In order to ensure that is very small compared to the Poisson distribution, we want If we fix and choose such that , observe the inequality
[TABLE]
It is sufficient to bound as
[TABLE]
Manipulating this statement we get:
[TABLE]
Since we already assumed that , we can rewrite this inequality as and solve for : This means that if , we have which allows us to rewrite our inequalities and gain bounds on :
[TABLE]
5. A Union Bound for the Restricted Isometry Property
This is the One Bit version of the Johnson-Lindenstrauss Lemma. In particular the quantitive bound on below is nearly identical to the best known bound in the linear Johnson-Lindenstrauss Lemma.
Theorem 5.1**.**
Fix and , let X be pairwise orthogonal vectors in . The random dimensional one-bit map, , satisfies the -RIP with probability at least when
[TABLE]
In the Restricted Isometry Property (RIP), we want to preserve the pairwise distances between the points so that for all pairs
[TABLE]
To ensure that we satisfy the -RIP with probability at least , we have to be certain that the probability of failure, , is small:
[TABLE]
Using the Union bound,
[TABLE]
We will first analyze the probability for one pair , namely:
[TABLE]
Lemma 5.2**.**
We claim that for all pairs
Proof.
The difference in the metrics is the average of the random variables:
[TABLE]
These are independent centered Bernoulli random variables that satisfy a large deviation inequality which is uniform in the Bernoulli parameter [hoeffding1963probability],
[TABLE]
∎
The Union Bound for the RIP: The last expression above provides a bound for the probability that one pair fails the -RIP. Now, summing over all pairs,
[TABLE]
We can bound this probability with and solve for :
[TABLE]
For all greater than this bound, must satisfy the -RIP with probability at least
6. A Phase Transition for the Restricted Isometry Property
For special X, we analyze the property of is a -RIP. Again, the size of the window’s dependence on , is very weak. This time the dependence is in terms of .
Theorem 6.1**.**
Fix , fix . Let be pairwise orthogonal vectors in , and let be the probability that satisfies . If , then when:
[TABLE]
and when
[TABLE]
Additionally, the phase transition is bounded as follows:
[TABLE]
*where ,
and where *
The graph below shows a simulation of the RIP property with . The red line is the bound (6.1.1), the green line is (6.1.3). The jagged blue line is the simulated value of the probability of being a -RIP. The line is jagged, due to the discrete nature of the Hamming metric. The latter fact is of course a complication implicit in our proof.
We will again analyze the phase transition from the perspective of the birthday problem. To do this we will count all that fail the RIP property, namely:
[TABLE]
Then is a sum of Bernoulli random variables. All are i.i.d with probability . By the general form of the Stein-Chen approximation is close to a Poisson distribution in total variation. We make this precise below:
[TABLE]
where
[TABLE]
Lemma 6.2**.**
We claim that where is .
Proof.
We know that In this special case, the geodesic distances between the points in is which reduces to:
[TABLE]
For each , is Bernoulli with parameter The are independently so is and we can rewrite as:
[TABLE]
∎
Lemma 6.3**.**
We can bound as such:
[TABLE]
where Additionally for , we can approximate this statement as:
[TABLE]
Proof.
As previously defined,
[TABLE]
Now we will use Sterling’s approximation,
[TABLE]
to obtain bounds for , but for the upper bound, we will use the fact that for .
Let and let us assess only the first term of the sum since it is the largest.
[TABLE]
We can rewrite these bounds in terms of :
[TABLE]
Using this inequality for the first term in , we gain a lower bound for :
[TABLE]
For the upper bound, we have at most summands in so the upper bound is:
[TABLE]
For we can simply these two statements above using the Taylor approximation for ,
[TABLE]
∎
Lemma 6.4**.**
We claim that: where is given in Using this we can bound in the window
[TABLE]
*where and
Proof.
We recall:
[TABLE]
In order to estimate , we need to estimate . Because the only coordinates that are dependent on are those that share exactly one coordinate with , . There are two ways this can happen: either or shares a coordinate with There are ways to choose the remaining coordinates. Assuming pairwise independence, we can estimate the size of :
[TABLE]
We can now bound which is equal to .
[TABLE]
[TABLE]
where and
∎
Lemma 6.5**.**
* are pairwise independent for all pairs, .*
Proof.
To show that are pairwise independent, it is sufficient to show that
[TABLE]
The only non-trivial case is when and share a common point. We can rewrite this probability as:
[TABLE]
where k is an element of After an orthogonal transformation, we can take to be the first three coordinates vectors The distribution of the are unchanged. The signs of the coordinates of the are independent, so the events are independent. Because all of these events are independent, the probability can be written as:
[TABLE]
Because each of these probabilities is identical distributed to There are are elements in , we get that , as desired. ∎
Solving For : Using the previous bounds on ,
[TABLE]
where and Fix , let X be pairwise orthogonal vectors in , and let be the probability that is one-to-one, then when:
[TABLE]
and when
[TABLE]
In order to ensure that is very small compared to and , we want If we fix and choose such that then using the inequality it is sufficient to bound as
[TABLE]
Manipulating this statement we gain:
[TABLE]
Because , we can rewrtite this inequality as:
[TABLE]
Since we assumed that , we can rewrite this inequality as and solve for :
[TABLE]
This means that if , we have and which allows us to rewrite our inequalities and get bounds on :
[TABLE]
[TABLE]
and
[TABLE]
[TABLE]
Let We remark that is approximately Let then:
[TABLE]
and
[TABLE]
This is a statement of the main theorem by inspection.
7. Acknowledgments
We would like to thank Dr. Michael Lacey and Dr. Robert Kesler for their assistance and mentorship. We would also like to thank the Georgia Institute of Technology and the NSF for their funding and support.
References
