Optimal Boolean Locality-Sensitive Hashing

Tobias Christiani

arXiv:1812.01557·cs.DM·December 5, 2018

Optimal Boolean Locality-Sensitive Hashing

Tobias Christiani

PDF

Open Access

TL;DR

This paper characterizes the optimal distribution over Boolean functions for locality-sensitive hashing, showing it assigns nonzero probability only to dictator functions to minimize a specific correlation ratio.

Contribution

It provides a theoretical characterization of the optimal Boolean LSH scheme, identifying dictator functions as the only functions with nonzero probability in the optimal distribution.

Findings

01

Optimal distribution over Boolean functions is supported only on dictator functions.

02

The ratio ho_{\u03b1, } is minimized by dictator functions.

03

Theoretical foundation for the design of optimal Boolean LSH schemes.

Abstract

For $0 \leq β < α < 1$ the distribution $H$ over Boolean functions $h : {- 1, 1}^{d} \to {- 1, 1}$ that minimizes the expression \begin{equation*} \rho_{\alpha, \beta} = \frac{\log(1/\Pr_{\substack{h \sim \mathcal{H} \\ (x, y) \text{ $α$ -corr.}}}[h(x) = h(y)])}{\log(1/\Pr_{\substack{h \sim \mathcal{H} \\ (x, y) \text{ $β$ -corr.}}}[h(x) = h(y)])} \end{equation*} assigns nonzero probability only to members of the set of dictator functions $h (x) = \pm x_{i}$ .

Equations62

ρ_{α, β} = \frac{lo g ( 1/ Pr _{h \sim H (x, y) α -corr.} [ h ( x ) = h ( y )])}{lo g ( 1/ Pr _{h \sim H (x, y) β -corr.} [ h ( x ) = h ( y )])}

ρ_{α, β} = \frac{lo g ( 1/ Pr _{h \sim H (x, y) α -corr.} [ h ( x ) = h ( y )])}{lo g ( 1/ Pr _{h \sim H (x, y) β -corr.} [ h ( x ) = h ( y )])}

h : {- 1, 1}^{d} \to {- 1, 1} .

h : {- 1, 1}^{d} \to {- 1, 1} .

y_{i} = {x_{i} - x_{i} with probability \frac{1 + α}{2}, with probability \frac{1 - α}{2} .

y_{i} = {x_{i} - x_{i} with probability \frac{1 + α}{2}, with probability \frac{1 - α}{2} .

p_{α} = h \sim H (x, y) α -corr. Pr [h (x) = h (y)] .

p_{α} = h \sim H (x, y) α -corr. Pr [h (x) = h (y)] .

ρ_{α, β} = \frac{lo g ( 1/ p _{α} )}{lo g ( 1/ p _{β} )}

ρ_{α, β} = \frac{lo g ( 1/ p _{α} )}{lo g ( 1/ p _{β} )}

ρ_{α, β} = \frac{lo g (( 1 + α ) /2 )}{lo g (( 1 + β ) /2 )} .

ρ_{α, β} = \frac{lo g (( 1 + α ) /2 )}{lo g (( 1 + β ) /2 )} .

f (x) = S \subseteq [d] \sum \hat{f} (S) x^{S}

f (x) = S \subseteq [d] \sum \hat{f} (S) x^{S}

⟨ f, g ⟩ = x \sim {- 1, 1}^{d} E [f (x) g (x)] = S \subseteq [d] \sum \hat{f} (S) \overset{g}{^} (S) .

⟨ f, g ⟩ = x \sim {- 1, 1}^{d} E [f (x) g (x)] = S \subseteq [d] \sum \hat{f} (S) \overset{g}{^} (S) .

W^{k} [f] = S \subseteq [d] ∣ S ∣ = k \sum \hat{f} (S)^{2} .

W^{k} [f] = S \subseteq [d] ∣ S ∣ = k \sum \hat{f} (S)^{2} .

⟨ f, f ⟩ = x \sim {- 1, 1}^{d} E [f (x)^{2}] = S \subseteq [d] \sum \hat{f} (S)^{2} = i = 0 \sum d W^{i} [f] = 1.

⟨ f, f ⟩ = x \sim {- 1, 1}^{d} E [f (x)^{2}] = S \subseteq [d] \sum \hat{f} (S)^{2} = i = 0 \sum d W^{i} [f] = 1.

T_{α} f (x) = y \sim N_{α} (x) E [f (y)] .

T_{α} f (x) = y \sim N_{α} (x) E [f (y)] .

⟨ f, T_{α} g ⟩

⟨ f, T_{α} g ⟩

= (x, y) α -corr. E [f (x) g (y)] = S \subseteq [d] \sum α^{∣ S ∣} \hat{f} (S) \overset{g}{^} (S) .

h \sim H (x, y) α -corr. E [h (x) h (y)]

h \sim H (x, y) α -corr. E [h (x) h (y)]

= p_{α} - (1 - p_{α})

= 2 p_{α} - 1.

p_{α} = (1 + h \sim H (x, y) α -corr. E [h (x) h (y)]) /2 = (1 + i = 0 \sum d α^{i} w_{i}) /2

p_{α} = (1 + h \sim H (x, y) α -corr. E [h (x) h (y)]) /2 = (1 + i = 0 \sum d α^{i} w_{i}) /2

ρ_{α, β} = \frac{lo g (( 1 + \sum _{i = 0}^{d} α ^{i} w _{i} ) /2 )}{lo g (( 1 + \sum _{i = 0}^{d} β ^{i} w _{i} ) /2 )} .

ρ_{α, β} = \frac{lo g (( 1 + \sum _{i = 0}^{d} α ^{i} w _{i} ) /2 )}{lo g (( 1 + \sum _{i = 0}^{d} β ^{i} w _{i} ) /2 )} .

\frac{\partial ρ}{\partial w _{0}} = \frac{\frac{\partial s ( α ) / \partial w _{0}}{1 + s ( α )} lo g \frac{1 + s ( β )}{2} - \frac{\partial s ( β ) / \partial w _{0}}{1 + s ( β )} lo g \frac{1 + s ( α )}{2}}{lo g ^{2} \frac{1 + s ( β )}{2}} .

\frac{\partial ρ}{\partial w _{0}} = \frac{\frac{\partial s ( α ) / \partial w _{0}}{1 + s ( α )} lo g \frac{1 + s ( β )}{2} - \frac{\partial s ( β ) / \partial w _{0}}{1 + s ( β )} lo g \frac{1 + s ( α )}{2}}{lo g ^{2} \frac{1 + s ( β )}{2}} .

\frac{1 + s ( β )}{1 - β} lo g \frac{1 + s ( β )}{2} > \frac{1 + s ( α )}{1 - α} lo g \frac{1 + s ( α )}{2} .

\frac{1 + s ( β )}{1 - β} lo g \frac{1 + s ( β )}{2} > \frac{1 + s ( α )}{1 - α} lo g \frac{1 + s ( α )}{2} .

\frac{\partial g}{\partial x} = \frac{s ^{'} ( x ) ( 1 - x ) + ( 1 + s ( x ))}{( 1 - x ) ^{2}} lo g \frac{1 + s ( x )}{2} + \frac{s ^{'} ( x )}{1 - x} .

\frac{\partial g}{\partial x} = \frac{s ^{'} ( x ) ( 1 - x ) + ( 1 + s ( x ))}{( 1 - x ) ^{2}} lo g \frac{1 + s ( x )}{2} + \frac{s ^{'} ( x )}{1 - x} .

(s^{'} (x) (1 - x) + 1 + s (x)) lo g \frac{1 + s ( x )}{2} + (1 - x) s^{'} (x) < 0.

(s^{'} (x) (1 - x) + 1 + s (x)) lo g \frac{1 + s ( x )}{2} + (1 - x) s^{'} (x) < 0.

ρ_{α, β} = \frac{lo g \frac{1 + s ( α )}{2}}{lo g \frac{1 + s ( β )}{2}}

ρ_{α, β} = \frac{lo g \frac{1 + s ( α )}{2}}{lo g \frac{1 + s ( β )}{2}}

φ (β) = ε_{1} / ε_{0} = \frac{β ^{γ_{0}} - β ^{γ}}{β ^{γ} - β ^{γ_{1}}} > 0.

φ (β) = ε_{1} / ε_{0} = \frac{β ^{γ_{0}} - β ^{γ}}{β ^{γ} - β ^{γ_{1}}} > 0.

\frac{\partial φ}{\partial x}

\frac{\partial φ}{\partial x}

⟺ \frac{λ _{0} x ^{λ_{0}}}{1 - x ^{λ_{0}}} > \frac{λ _{1} x ^{λ_{1}}}{1 - x ^{λ_{1}}} .

ρ (γ) = \frac{lo g \frac{1 + α ^{γ}}{2}}{lo g \frac{1 + β ^{γ}}{2}} .

ρ (γ) = \frac{lo g \frac{1 + α ^{γ}}{2}}{lo g \frac{1 + β ^{γ}}{2}} .

\frac{\partial ρ}{\partial γ} > 0

\frac{\partial ρ}{\partial γ} > 0

⟺ \frac{α ^{γ} lo g α}{1 + α ^{γ}} lo g \frac{1 + β ^{γ}}{2} - \frac{β ^{γ} lo g β}{1 + β ^{γ}} lo g \frac{1 + α ^{γ}}{2} > 0

⟺ \frac{1 + β ^{γ}}{β ^{γ} lo g β} lo g \frac{1 + β ^{γ}}{2} > \frac{1 + α ^{γ}}{α ^{γ} lo g α} lo g \frac{1 + α ^{γ}}{2} .

- (1 + x^{γ} + γ lo g x) lo g \frac{1 + x ^{γ}}{2} + γ x^{γ} lo g x < 0

- (1 + x^{γ} + γ lo g x) lo g \frac{1 + x ^{γ}}{2} + γ x^{γ} lo g x < 0

- (1 + x^{γ} + γ lo g x) lo g \frac{1 + x ^{γ}}{2} + γ x^{γ} lo g x

- (1 + x^{γ} + γ lo g x) lo g \frac{1 + x ^{γ}}{2} + γ x^{γ} lo g x

< - (1 + x^{γ} + lo g x^{γ}) \frac{x ^{γ} - 1}{2} + x^{γ} lo g x^{γ}

= 1 - x^{2 γ} + (1 + x^{γ}) lo g x^{γ}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Complexity and Algorithms in Graphs · Algorithms and Data Compression

Full text

Optimal Boolean Locality-Sensitive Hashing

Tobias Christiani

[email protected]

IT University of Copenhagen and BARC

Abstract

For $0\leq\beta<\alpha<1$ the distribution $\mathcal{H}$ over Boolean functions $h\colon\{-1,1\}^{d}\to\{-1,1\}$ that minimizes the expression

[TABLE]

assigns nonzero probability only to members of the set of dictator functions $h(x)=\pm x_{i}$ .

1 Introduction

We will be studying Boolean functions, i.e., functions that for a positive integer $d$ can be written in the form

[TABLE]

We are concerned with the behavior of such Boolean functions on input pairs $x,y\in\{-1,1\}^{d}$ that are randomly generated.

Definition 1.

For $-1\leq\alpha\leq 1$ and $x\in\{-1,1\}^{d}$ we let $N_{\alpha}(x)$ denote the distribution over $\{-1,1\}^{d}$ where each component of $y\sim N_{\alpha}(x)$ is independently distributed according to

[TABLE]

We say that $(x,y)$ is randomly $\alpha$ -correlated if $x$ is uniformly distributed over $\{-1,1\}^{d}$ and $y\sim N_{\alpha}(x)$ .

Let $\mathcal{H}$ denote a distribution over functions $h\colon\{-1,1\}^{d}\to R$ where $R$ is a finite set and define

[TABLE]

For $0\leq\beta<\alpha\leq 1$ we wish to characerize the distributions that minimize the expression

[TABLE]

when we restrict $\mathcal{H}$ to be a distribution over Boolean functions $h\colon\{-1,1\}^{d}\to\{-1,1\}$ . The expression for $\rho_{\alpha,\beta}$ in equation (1) is a well-known quantity in the study of approximate near neighbor search governing the query time and space usage of solutions based on locality-sensitive hashing [3].

2 Related work

Indyk and Motwani [3] introduced the uniform distribution over the set of dictator functions as a family of locality-sensitive hash functions for the Boolean hypercube. O’Donnell et al. [6] showed that for general families $\mathcal{H}$ it must hold that $\rho_{\alpha,\beta}\geq\log(1/\alpha)/\log(1/\beta)$ . This matches the upper bound of Indyk and Motwani [3] when $\alpha,\beta$ approach $1$ . Another line of work[7, 2] using hypercontractive inequalities showed that $\rho_{\alpha,0}\geq(1-\alpha)/(1+\alpha)$ , matching the upper bound of Andoni et al. [1].

The question of finding lower bounds for $\rho_{\alpha,\beta}$ for every choice of $0\leq\beta<\alpha\leq 1$ is still open. In this note we answer the question for distributions over Boolean functions, showing that the upper bound of Indyk and Motwani is optimal. The resulting $\rho$ -value is given by

[TABLE]

3 Preliminaries

We will be using tools from the Fourier analysis of Boolean functions to find the minimum of $\rho_{\alpha,\beta}$ . For a more detailed overview we refer to the book by O’Donnell [5]. We will be using the fact that Boolean functions can be uniquely expressed as multilinear polynomials:

Theorem 2.

Every function $f\colon\{-1,1\}^{d}\to\mathbb{R}$ can be uniquely expressed as a multilinear polynomial

[TABLE]

where $\hat{f}(S)\in\mathbb{R}$ and $x^{S}=\prod_{i\in S}x_{i}$ .

For $S\subseteq[d]$ we refer to $\hat{f}(S)$ as the Fourier coefficient of $f$ on $S$ . The two following Theorems define an inner product between Boolean function and shows how it relates to their Fourier coefficents.

Theorem 3 (Plancherel’s Theorem).

For any $f,g\colon\{-1,1\}^{d}\to\mathbb{R}$

[TABLE]

The concept of Fourier weight will be useful when characterizing the how Boolean functions behave on noisy inputs:

Definition 4.

For $f\colon\{-1,1\}^{d}\to R$ define the Fourier weight of $f$ at degree $k\in[d]$ by

[TABLE]

Consider Plancherel’s Theorem with $f=g$ and where $f$ is Boolean-valued. In this case we get that the sum of the squared Fourier coefficients of $f$ equals 1. This result is known as Parseval’s Theorem and we will make use of it to determine where to place to Fourier weight of $f$ in order to minimize $\rho$ .

Theorem 5 (Parseval’s Theorem).

For any $f\colon\{-1,1\}^{d}\to\{-1,1\}$

[TABLE]

In order to study the behavior of Boolean functions under noise we introduce the noise operator $T_{\alpha}$ .

Definition 6.

For $\alpha\in[-1,1]$ the noise operator with parameter $\alpha$ is the linear operator $T_{\alpha}$ on functions $f\colon\{-1,1\}^{d}\to\mathbb{R}$ defined by

[TABLE]

The Fourier expansion of $T_{\alpha}f(x)$ is given by $\sum_{S\subseteq[d]}\alpha^{|S|}\hat{f}(S)x^{S}$ . From Plancherel’s Theorem it follows that

[TABLE]

In the analysis of our problem the following inequality will be used several times. For the remainder of this Chapter we will use $\log x$ to denote the natural logarithm of $x$ .

Lemma 7.

For $x>0$ we have $\log x\leq x-1$ with equality if and only if $x=1$ .

4 Bit-sampling is optimal

Our approach will be to minimize $\rho_{\alpha,\beta}$ subject to the constraint that members of $\mathcal{H}$ are Boolean functions $h\colon\{-1,1\}^{d}\to\{-1,1\}$ . We begin by making some observations to simplify the problem. For $h\sim\mathcal{H}$ we can directly relate the noise-sensitivity under random $\alpha$ -correlated inputs to the collision probability.

[TABLE]

Using Equation (2) we can write $p_{\alpha}$ as follows:

[TABLE]

where we use $w_{i}$ to denote the expected Fourier weight of $h\sim\mathcal{H}$ at degree $i$ defined by $w_{i}=\operatorname*{\mathbb{E}}_{h\sim\mathcal{H}}\sum_{i=0}^{d}W^{i}[h]$ . From Plancherel’s Theorem we have that $\sum_{i=0}^{d}w_{i}=1$ . We will now consider how to set $w_{0},w_{1},\dots,w_{d}$ to minimize the expression

[TABLE]

An optimal solution $w^{*}_{0},\dots,w^{*}_{d}$ for this problem will yield an optimal solution to the original problem, provided there actually exists a Boolean-valued function satisfying the weight assignment. We will show that the assignment $w^{*}_{1}=1$ and $w^{*}_{i}=0$ for $i\neq 1$ minimizes $\rho_{\alpha,\beta}$ . The distribution $\mathcal{H}$ therefore only assigns positive probability to functions $h$ that have all their Fourier weight concentrated at degree $1$ . It turns out that a Boolean function satisfies this weight assignment if and only if it is a dictator function. Lemma 8 is well-known and is the answer to exercise 1.19 in [5]. We include the proof for completeness.

Lemma 8.

Let $f\colon\{-1,1\}^{d}\to\{-1,1\}$ and suppose that $W^{1}[f]=1$ , then $f(x)=\pm x_{i}$ .

Proof.

From Parseval’s Theorem we know that $\sum_{i}W^{i}[f]=1$ and it follows that $\hat{f}(S)=0$ for $|S|\neq 1$ . The function $f$ can therefore be written in the form $f(x)=\sum_{i=1}^{d}\hat{f}_{i}x_{i}$ where $\hat{f}_{i}=\hat{f}(S)$ for $S=\{i\}$ . By the condition $W^{1}[f]=1$ there exists $j\in[d]$ such that $\hat{f}_{j}\neq 0$ . Fix the $d-1$ components $x_{i\neq j}$ of $x$ and note that since $f$ maps to $\{-1,1\}$ the sum $f(x)=\hat{f}_{j}x_{j}+\sum_{i\neq j}\hat{f}_{i}x_{i}$ must satisfy $f(x)=\pm 1$ when $x_{j}=\pm 1$ . For $\hat{f}_{j}\neq 0$ this is only possible when $\hat{f}_{j}=\pm 1$ which implies that $\hat{f}_{i}=0$ for $i\neq j$ . It follows that $f$ must be one of the $2d$ functions of the form $f(x)=\pm x_{i}$ . ∎

4.1 Optimal Fourier weight at degree zero

We begin by arguing that we can restrict our attention to showing that dictator functions are optimal in the case where $0<\beta<\alpha<1$ . If $\alpha=1$ then for $w_{1}=1$ we have that $\rho=0$ which is the best we can hope for (but this could also be achieved by other weight assignments, hence the statement of the main theorem is for $\alpha<1$ .). For $\beta=0$ the following Lemma showing that $w_{0}^{*}=0$ combined with the fact that for this setting we maximize $p_{\alpha}$ by setting $w_{1}=1$ shows that the dictator functions are optimal. We will now show that an optimal solution has no Fourier weight at degree zero.

Lemma 9.

$w_{0}^{*}=0$ .

Proof.

If $w_{0}=1$ we have $\rho=1$ and it is clear that $\rho<1$ if we set $w_{1}=1$ . Suppose that $0<w_{0}^{*}<1$ . We will show that in this case we can move some weight from $w_{0}$ to $w_{1}$ and decrease the value of $\rho$ . For a given weight assignment define $s(\alpha)=\sum_{i}\alpha^{i}w_{i}$ and write $w_{1}$ as $w_{1}=1-\sum_{j\neq i}w_{j}$ . The partial derivative of $\rho=\log((1+s(\alpha))/2)/\log((1+s(\beta))/2)$ with respect to $w_{0}$ is given by

[TABLE]

By rearranging and using that $\partial s(\alpha)/\partial w_{0}=1-\alpha$ we find that $\frac{\partial\rho}{\partial w_{0}}>0$ is equivalent to

[TABLE]

It suffices to show that the function $g(x)=\frac{1+s(x)}{1-x}\log\frac{1+s(x)}{2}$ is decreasing for $0<x<1$ .

[TABLE]

Rewriting, this is equivalent to showing that

[TABLE]

By the assumption that $0<w_{0}<1$ we have that $0<s(x)<1$ and using Lemma 7 we get that $\log((1+s(x))/2)<(s(x)-1)/2$ . The condition in equation (3) then simplifies to showing that $s^{\prime}(x)(1-x)+s(x)\leq 1$ . The function $s(x)=\sum_{i}w_{i}x^{i}$ is a weighted sum of simple monomials where the weights sum to one. It therefore suffices to show that the inequality holds for every monomial $s_{k}(x)=x^{k}$ where $k=\{0,1,\dots,d\}$ . For $k=0$ and $k=1$ we have $s_{k}^{\prime}(x)(1-x)+s_{k}(x)=1$ satisfying the desired inequality. For $k\geq 2$ we have $s_{k}^{\prime}(x)(1-x)+s_{k}(x)=kx^{k-1}+(k-1)x^{k}$ . We see that $s_{k}(0)=0$ and $s_{k}(1)=1$ and by inspecting the derivative of $s_{k}(x)$ we see that it is increasing for $x\in(0,1)$ . It follows that the inequality is satisfied, completing the proof. ∎

4.2 A continuous optimization problem

In order to simplify the problem of minimizing $\rho$ we will optimize over a larger space. In particular we will let $W$ denote a collection of pairs $(w,\kappa)$ such that $\sum_{w\in W}w=1$ where we restrict $\kappa\in\mathbb{R}$ to satisfy $\kappa\geq 1$ . We define $s(x)=\sum_{(w,\kappa)\in W}wx^{\kappa}$ and we will now attempt to specify the function $s$ that minimizes

[TABLE]

subject to the constraint that $s(\beta)=b\leq\beta$ is fixed. The constraint that $s(\beta)\leq\beta$ follows from the restrictions on $s$ . We can therefore write $b=\beta^{\gamma}$ for some $\gamma\geq 1$ . For fixed $s(\beta)$ it is clear that we minimize $\rho$ by maximizing $s(\alpha)$ .

Lemma 10.

For fixed $s(\beta)=\beta^{\gamma}$ we maximize $s(\alpha)$ by setting $s(x)=x^{\gamma}$ .

Proof.

Let $w$ denote the weight on the exponent $\gamma$ in the specification $W$ of $s$ . We will prove that if $w<1$ then we can increase $s(\alpha)$ by rearranging the weights of $s$ to put more weight onto $(w,\gamma)$ . Note that if $w<1$ and we have a valid configuration of weights (in the sense that $s(\beta)=\beta^{\gamma}$ ) there must exist exponents $\gamma_{0}<\gamma<\gamma_{1}$ such that there is positive weight on $\gamma_{0}$ and $\gamma_{1}$ . If all the remaining weight was concentrated to either side of $\gamma$ the condition $s(\beta)=\beta^{\gamma}$ would be violated. We will now move $\varepsilon_{0}$ weight from $w_{0}$ to $w$ and $\varepsilon_{1}$ weight from $w_{1}$ to $w$ where we set $\varepsilon_{0},\varepsilon_{1}$ to ensure that $s(\beta)=\beta^{\gamma}$ after the move. It turns out that this condition is satisfied for the following ratio

[TABLE]

The change in $s(\alpha)$ due to the rearrangement of weights can be shown to be positive if $\varphi(\alpha)<\varphi(\beta)$ . Therefore, it suffices to show that $\varphi(x)$ is decreasing for $0<x<1$ when $\gamma_{0}<\gamma<\gamma_{1}$ . To simplify further, we define $\lambda_{0}=\gamma_{0}-\gamma$ and $\lambda_{1}=\gamma_{1}-\gamma$ which satisfy $\lambda_{0}<0<\lambda_{1}$ . Rewriting $\varphi(x)=-(1-x^{\lambda}_{0})/(1-x^{\lambda}_{1})$ and differentiating we get

[TABLE]

It suffices to show that $\psi(x)=\frac{xa^{x}}{1-a^{x}}$ is decreasing in $x$ for $a\in(0,1)$ . We have that $\psi^{\prime}(x)=a^{x}(1-a^{x})+a^{x}\log a^{x}$ . Define $z=a^{x}$ and note that $z>0$ and $z\neq 1$ . We have that $z(1-z)+z\log z<0\iff(1-z)+\log z<0$ and by Lemma 7 we see that $\log z<z-1$ , completing the proof. ∎

4.3 Univariate analysis

According to Lemma 10 we can now restrict our attention to the problem of finding $\gamma\geq 1$ that minimizes the function

[TABLE]

We will show the derivative of $\rho$ is positive, implying that it is minimized when $\gamma=1$ .

Lemma 11.

$\rho^{\prime}(\gamma)>0$ .

Proof.

From inspecting the derivative of $\rho$ with respect to $\gamma$ we see that

[TABLE]

Therefore it suffices to show that the function $g(x)=\frac{1+x^{\gamma}}{x^{\gamma}\log x}\log\frac{1+x^{\gamma}}{2}$ is decreasing for $0<x<1$ and $\gamma\geq 1$ . From inspecting $g^{\prime}(x)$ we see that the condition that $g^{\prime}(x)<0$ is equivalent to

[TABLE]

If $1+x^{\gamma}+\gamma\log x\geq 0$ then the condition is satisfied and we are done. Otherwise we can use the fact that $-(1+x^{\gamma}+\gamma\log x)>0$ together with Lemma 7 to produce following derivation:

[TABLE]

Reapplying Lemma 7 we see that $(1+x^{\gamma})\log x^{\gamma}<(1+x^{\gamma})(x^{\gamma}-1)=-(1-x^{2\gamma})$ completing the proof. ∎

4.4 Stating the result

We will now summarize how the results from the previous subsections yield the main result of this paper as stated in the abstract. To find the the distribution over Boolean functions minimizing $\rho$ we first considered the optimal weight assignment in the expression $s(x)=\sum_{i}w_{i}\alpha^{i}$ subject to the constraint that $\sum_{i}w_{i}=1$ . Finding an optimal assignment does not guarantee that we have solved the problem, because there may not exist a Boolean function with a given weight assignment, but if one or more Boolean functions that satisfy the optimal assignment exists we will have solved the problem. In Lemma 9 we showed that an optimal solution $w_{0}^{*},w_{1}^{*},\dots w_{d}^{*}$ must have $w_{0}^{*}=0$ . Therefore the optimal solution can only have non-zero weight on exponents $k\geq 1$ . Next, in Lemma 10, we argued that if we allow continuous exponents $k\in\mathbb{R}$ with $k\geq 1$ in $s(x)$ then the problem of minimizing $\rho$ becomes the problem of selecting $\gamma\geq 1$ where $s(x)=x^{\gamma}$ . Lemma 11 showed that $\rho(\gamma)$ is increasing, so to minimize $\rho$ we want to set $\gamma=1$ . The conclusion from these optimization problems is that we minimize $\rho$ by setting $w_{1}^{*}=1$ . Finally Lemma 8 shows that the subset of the Boolean functions with $w_{1}=1$ is exactly the set of dictator functions $f(x)=\pm x_{i}$ . Together with the fact that $w_{1}^{*}=1$ is a unique minimum of $\rho$ in the weight assignment problem we get our main result.

5 Open problems

Orthogonal search.

It appears that the same techniques can be used to show that pairs of functions of the form $f(x)=x_{i}x_{j}$ , $g(y)=-x_{i}x_{j}$ minimize the function

[TABLE]

Extension to negative correlation.

It seems likely that the dictator functions or bit-sampling minimizes $\rho$ for the entire interval $-1\leq\beta<\alpha\leq 1$ . Unfortunately the current proof breaks down in places.

General hash functions.

Showing tight bounds for hash function with an arbitrary range is an interesting open problem. For orthogonal search this is an open problem even in the case of $\rho_{\alpha,0}$ . For more information see the symmetric Gaussian problem in [4].

Investigating what the implications of the results in this paper for functions with an arbitrary range through the use of $1$ -bit hashing is an interesting problem.

Bibliography7

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Andoni and I. Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. In Proc. STOC ’15 , pages 793–801, 2015.
2[2] A. Andoni and I. Razensteyn. Tight lower bounds for data-dependent locality-sensitive hashing. In Proc. So CG ’16 , pages 9:1–9:11, 2016.
3[3] P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proc. STOC ’98 , pages 604–613, 1998.
4[4] R. O’Donnell. Open problems in analysis of boolean functions. Co RR , abs/1204.6447, 2012.
5[5] R. O’Donnell. Analysis of Boolean Functions . Cambridge University Press, 2014.
6[6] R. O’Donnell, Y. Wu, and Y. Zhou. Optimal lower bounds for locality-sensitive hashing (except when q is tiny). ACM Transactions on Computation Theory (TOCT) , 6(1):5, 2014.
7[7] R. Panigrahy, K. Talwar, and U. Wieder. A geometric approach to lower bounds for approximate near-neighbor search and partial match. In Proc. FOCS ’08 , pages 414–423, 2008.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Optimal Boolean Locality-Sensitive Hashing

Abstract

1 Introduction

Definition 1**.**

2 Related work

3 Preliminaries

Theorem 2**.**

Theorem 3** (Plancherel’s Theorem).**

Definition 4**.**

Theorem 5** (Parseval’s Theorem).**

Definition 6**.**

Lemma 7**.**

4 Bit-sampling is optimal

Lemma 8**.**

Proof.

4.1 Optimal Fourier weight at degree zero

Lemma 9**.**

Proof.

4.2 A continuous optimization problem

Lemma 10**.**

Proof.

4.3 Univariate analysis

Lemma 11**.**

Proof.

4.4 Stating the result

5 Open problems

Orthogonal search.

Extension to negative correlation.

General hash functions.

Definition 1.

Theorem 2.

Theorem 3 (Plancherel’s Theorem).

Definition 4.

Theorem 5 (Parseval’s Theorem).

Definition 6.

Lemma 7.

Lemma 8.

Lemma 9.

Lemma 10.

Lemma 11.