Optimal Boolean Locality-Sensitive Hashing
Tobias Christiani

TL;DR
This paper characterizes the optimal distribution over Boolean functions for locality-sensitive hashing, showing it assigns nonzero probability only to dictator functions to minimize a specific correlation ratio.
Contribution
It provides a theoretical characterization of the optimal Boolean LSH scheme, identifying dictator functions as the only functions with nonzero probability in the optimal distribution.
Findings
Optimal distribution over Boolean functions is supported only on dictator functions.
The ratio ho_{\u03b1, } is minimized by dictator functions.
Theoretical foundation for the design of optimal Boolean LSH schemes.
Abstract
For the distribution over Boolean functions that minimizes the expression \begin{equation*} \rho_{\alpha, \beta} = \frac{\log(1/\Pr_{\substack{h \sim \mathcal{H} \\ (x, y) \text{ -corr.}}}[h(x) = h(y)])}{\log(1/\Pr_{\substack{h \sim \mathcal{H} \\ (x, y) \text{ -corr.}}}[h(x) = h(y)])} \end{equation*} assigns nonzero probability only to members of the set of dictator functions .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Complexity and Algorithms in Graphs · Algorithms and Data Compression
Optimal Boolean Locality-Sensitive Hashing
Tobias Christiani
IT University of Copenhagen and BARC
Abstract
For the distribution over Boolean functions that minimizes the expression
[TABLE]
assigns nonzero probability only to members of the set of dictator functions .
1 Introduction
We will be studying Boolean functions, i.e., functions that for a positive integer can be written in the form
[TABLE]
We are concerned with the behavior of such Boolean functions on input pairs that are randomly generated.
Definition 1**.**
For and we let denote the distribution over where each component of is independently distributed according to
[TABLE]
We say that is randomly -correlated if is uniformly distributed over and .
Let denote a distribution over functions where is a finite set and define
[TABLE]
For we wish to characerize the distributions that minimize the expression
[TABLE]
when we restrict to be a distribution over Boolean functions . The expression for in equation (1) is a well-known quantity in the study of approximate near neighbor search governing the query time and space usage of solutions based on locality-sensitive hashing [3].
2 Related work
Indyk and Motwani [3] introduced the uniform distribution over the set of dictator functions as a family of locality-sensitive hash functions for the Boolean hypercube. O’Donnell et al. [6] showed that for general families it must hold that . This matches the upper bound of Indyk and Motwani [3] when approach . Another line of work[7, 2] using hypercontractive inequalities showed that , matching the upper bound of Andoni et al. [1].
The question of finding lower bounds for for every choice of is still open. In this note we answer the question for distributions over Boolean functions, showing that the upper bound of Indyk and Motwani is optimal. The resulting -value is given by
[TABLE]
3 Preliminaries
We will be using tools from the Fourier analysis of Boolean functions to find the minimum of . For a more detailed overview we refer to the book by O’Donnell [5]. We will be using the fact that Boolean functions can be uniquely expressed as multilinear polynomials:
Theorem 2**.**
Every function can be uniquely expressed as a multilinear polynomial
[TABLE]
where and .
For we refer to as the Fourier coefficient of on . The two following Theorems define an inner product between Boolean function and shows how it relates to their Fourier coefficents.
Theorem 3** (Plancherel’s Theorem).**
For any
[TABLE]
The concept of Fourier weight will be useful when characterizing the how Boolean functions behave on noisy inputs:
Definition 4**.**
For define the Fourier weight of at degree by
[TABLE]
Consider Plancherel’s Theorem with and where is Boolean-valued. In this case we get that the sum of the squared Fourier coefficients of equals 1. This result is known as Parseval’s Theorem and we will make use of it to determine where to place to Fourier weight of in order to minimize .
Theorem 5** (Parseval’s Theorem).**
For any
[TABLE]
In order to study the behavior of Boolean functions under noise we introduce the noise operator .
Definition 6**.**
For the noise operator with parameter is the linear operator on functions defined by
[TABLE]
The Fourier expansion of is given by . From Plancherel’s Theorem it follows that
[TABLE]
In the analysis of our problem the following inequality will be used several times. For the remainder of this Chapter we will use to denote the natural logarithm of .
Lemma 7**.**
For we have with equality if and only if .
4 Bit-sampling is optimal
Our approach will be to minimize subject to the constraint that members of are Boolean functions . We begin by making some observations to simplify the problem. For we can directly relate the noise-sensitivity under random -correlated inputs to the collision probability.
[TABLE]
Using Equation (2) we can write as follows:
[TABLE]
where we use to denote the expected Fourier weight of at degree defined by . From Plancherel’s Theorem we have that . We will now consider how to set to minimize the expression
[TABLE]
An optimal solution for this problem will yield an optimal solution to the original problem, provided there actually exists a Boolean-valued function satisfying the weight assignment. We will show that the assignment and for minimizes . The distribution therefore only assigns positive probability to functions that have all their Fourier weight concentrated at degree . It turns out that a Boolean function satisfies this weight assignment if and only if it is a dictator function. Lemma 8 is well-known and is the answer to exercise 1.19 in [5]. We include the proof for completeness.
Lemma 8**.**
Let and suppose that , then .
Proof.
From Parseval’s Theorem we know that and it follows that for . The function can therefore be written in the form where for . By the condition there exists such that . Fix the components of and note that since maps to the sum must satisfy when . For this is only possible when which implies that for . It follows that must be one of the functions of the form . ∎
4.1 Optimal Fourier weight at degree zero
We begin by arguing that we can restrict our attention to showing that dictator functions are optimal in the case where . If then for we have that which is the best we can hope for (but this could also be achieved by other weight assignments, hence the statement of the main theorem is for .). For the following Lemma showing that combined with the fact that for this setting we maximize by setting shows that the dictator functions are optimal. We will now show that an optimal solution has no Fourier weight at degree zero.
Lemma 9**.**
.
Proof.
If we have and it is clear that if we set . Suppose that . We will show that in this case we can move some weight from to and decrease the value of . For a given weight assignment define and write as . The partial derivative of with respect to is given by
[TABLE]
By rearranging and using that we find that is equivalent to
[TABLE]
It suffices to show that the function is decreasing for .
[TABLE]
Rewriting, this is equivalent to showing that
[TABLE]
By the assumption that we have that and using Lemma 7 we get that . The condition in equation (3) then simplifies to showing that . The function is a weighted sum of simple monomials where the weights sum to one. It therefore suffices to show that the inequality holds for every monomial where . For and we have satisfying the desired inequality. For we have . We see that and and by inspecting the derivative of we see that it is increasing for . It follows that the inequality is satisfied, completing the proof. ∎
4.2 A continuous optimization problem
In order to simplify the problem of minimizing we will optimize over a larger space. In particular we will let denote a collection of pairs such that where we restrict to satisfy . We define and we will now attempt to specify the function that minimizes
[TABLE]
subject to the constraint that is fixed. The constraint that follows from the restrictions on . We can therefore write for some . For fixed it is clear that we minimize by maximizing .
Lemma 10**.**
For fixed we maximize by setting .
Proof.
Let denote the weight on the exponent in the specification of . We will prove that if then we can increase by rearranging the weights of to put more weight onto . Note that if and we have a valid configuration of weights (in the sense that ) there must exist exponents such that there is positive weight on and . If all the remaining weight was concentrated to either side of the condition would be violated. We will now move weight from to and weight from to where we set to ensure that after the move. It turns out that this condition is satisfied for the following ratio
[TABLE]
The change in due to the rearrangement of weights can be shown to be positive if . Therefore, it suffices to show that is decreasing for when . To simplify further, we define and which satisfy . Rewriting and differentiating we get
[TABLE]
It suffices to show that is decreasing in for . We have that . Define and note that and . We have that and by Lemma 7 we see that , completing the proof. ∎
4.3 Univariate analysis
According to Lemma 10 we can now restrict our attention to the problem of finding that minimizes the function
[TABLE]
We will show the derivative of is positive, implying that it is minimized when .
Lemma 11**.**
.
Proof.
From inspecting the derivative of with respect to we see that
[TABLE]
Therefore it suffices to show that the function is decreasing for and . From inspecting we see that the condition that is equivalent to
[TABLE]
If then the condition is satisfied and we are done. Otherwise we can use the fact that together with Lemma 7 to produce following derivation:
[TABLE]
Reapplying Lemma 7 we see that completing the proof. ∎
4.4 Stating the result
We will now summarize how the results from the previous subsections yield the main result of this paper as stated in the abstract. To find the the distribution over Boolean functions minimizing we first considered the optimal weight assignment in the expression subject to the constraint that . Finding an optimal assignment does not guarantee that we have solved the problem, because there may not exist a Boolean function with a given weight assignment, but if one or more Boolean functions that satisfy the optimal assignment exists we will have solved the problem. In Lemma 9 we showed that an optimal solution must have . Therefore the optimal solution can only have non-zero weight on exponents . Next, in Lemma 10, we argued that if we allow continuous exponents with in then the problem of minimizing becomes the problem of selecting where . Lemma 11 showed that is increasing, so to minimize we want to set . The conclusion from these optimization problems is that we minimize by setting . Finally Lemma 8 shows that the subset of the Boolean functions with is exactly the set of dictator functions . Together with the fact that is a unique minimum of in the weight assignment problem we get our main result.
5 Open problems
Orthogonal search.
It appears that the same techniques can be used to show that pairs of functions of the form , minimize the function
[TABLE]
Extension to negative correlation.
It seems likely that the dictator functions or bit-sampling minimizes for the entire interval . Unfortunately the current proof breaks down in places.
General hash functions.
Showing tight bounds for hash function with an arbitrary range is an interesting open problem. For orthogonal search this is an open problem even in the case of . For more information see the symmetric Gaussian problem in [4].
Investigating what the implications of the results in this paper for functions with an arbitrary range through the use of -bit hashing is an interesting problem.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Andoni and I. Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proc. STOC ’15 , pages 793–801, 2015.
- 2[2] A. Andoni and I. Razensteyn. Tight lower bounds for data-dependent locality-sensitive hashing. In Proc. So CG ’16 , pages 9:1–9:11, 2016.
- 3[3] P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proc. STOC ’98 , pages 604–613, 1998.
- 4[4] R. O’Donnell. Open problems in analysis of boolean functions. Co RR , abs/1204.6447, 2012.
- 5[5] R. O’Donnell. Analysis of Boolean Functions . Cambridge University Press, 2014.
- 6[6] R. O’Donnell, Y. Wu, and Y. Zhou. Optimal lower bounds for locality-sensitive hashing (except when q is tiny). ACM Transactions on Computation Theory (TOCT) , 6(1):5, 2014.
- 7[7] R. Panigrahy, K. Talwar, and U. Wieder. A geometric approach to lower bounds for approximate near-neighbor search and partial match. In Proc. FOCS ’08 , pages 414–423, 2008.
