Near-optimal linear decision trees for k-SUM and related problems
Daniel M. Kane, Shachar Lovett, Shay Moran

TL;DR
This paper develops near-optimal linear decision trees for problems like k-SUM, SUBSET-SUM, and sumset sorting, using comparison-based queries with query complexity close to theoretical limits.
Contribution
It introduces constructions of linear decision trees for combinatorial problems based on inference dimension, connecting machine learning concepts with discrete geometry.
Findings
Constructed linear decision trees for k-SUM with O(n log^2 n) queries.
Achieved near-optimal query complexity for SUBSET-SUM and sumset sorting.
Utilized comparison queries with sparse coefficients for efficient decision trees.
Abstract
We construct near optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant , we construct linear decision trees that solve the -SUM problem on elements using linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two -subsets; when viewed as linear queries, comparison queries are -sparse and have only coefficients. We give similar constructions for sorting sumsets and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms. Our constructions are based on the notion of "inference dimension", recently introduced by the authors in the context of active classification with comparison queries. This can be viewed as another contribution to the fruitful link between machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Near-optimal linear decision trees for k-SUM and related problems
Daniel M. Kane Department of Computer Science and Engineering/Department of Mathematics, University of California, San Diego. [email protected] Supported by NSF CAREER Award ID 1553288 and a Sloan fellowship.
Shachar Lovett Department of Computer Science and Engineering, University of California, San Diego. [email protected]. Research supported by NSF CAREER award 1350481, CCF award 1614023 and a Sloan fellowship.
Shay Moran Department of Computer Science and Engineering, University of California, San Diego, Simons Institute for the Theory of Computing, Berkeley, and Max Planck Institute for Informatics, Saarbrücken, Germany. [email protected].
Abstract
We construct near optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant , we construct linear decision trees that solve the -SUM problem on elements using linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two -subsets; when viewed as linear queries, comparison queries are -sparse and have only coefficients. We give similar constructions for sorting sumsets and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms.
Our constructions are based on the notion of “inference dimension”, recently introduced by the authors in the context of active classification with comparison queries. This can be viewed as another contribution to the fruitful link between machine learning and discrete geometry, which goes back to the discovery of the VC dimension.
1 Introduction
This paper studies the linear decision tree complexity of several combinatorial problems, such as -SUM, SUBSET-SUM, KNAPSACK, sorting sumsets, and more. A common feature these problems share is that they are all instances of the following fundamental problem in computational geometry.
The point-location problem.
Let be a finite set. Consider the problem in which given as an input, the goal is to compute the function
[TABLE]
where is the sign function and is the standard inner product in .
In discrete geometry this is known as the point-location in an hyperplane-arrangement problem, in which each is identified with the hyperplane orthogonal to , and corresponds to the cell in the partition induced by the hyperplanes in to which the input point belongs.
A dual formulation of this problem has been considered in learning theory, specifically within the context of active learning: here, each is thought of as a point, is thought of as the learned half-space, and computing corresponds to learning how each point is classified by . In this work it will often be more intuitive to consider this dual formulation. See Figure 1 for a planar illustration of both interpretations.
Linear decision tree.
A linear decision tree for the point-location problem is an adaptive deterministic algorithm . The set is known in advance, and the input is . The algorithm does not have direct access to . Instead, at each iteration the algorithm chooses some and queries “” (note that is not necessarily in ). At the end, the algorithm should be able to compute correctly. The query complexity is the maximum over of the number of queries performed. Equivalently, such an algorithm can be described by a ternary decision tree which computes the sign of a linear query at each inner node. A query is -sparse if it involves at most nonzero coefficients. A linear decision tree is -sparse if all its queries are -sparse.
Comparison decision tree.
A comparison decision tree for the point-location problem is a special type of a linear decision tree, where the only queries used are either of the form for (label queries), or for (comparison queries). Note that if and only if , which is why we call these comparison queries. In the dual version (in which we view as a set of points), comparison queries have a natural geometric interpretation: assuming that , a comparison query , corresponds to querying which one of is further from the hyperplane defined by . Observe that if all elements are -sparse then a comparison decision tree is -sparse.
1.1 Results
Our main result is a method that produces near optimal decision trees for many natural and well studied combinatorial instances for the point-location problems by using comparison decision trees. We first describe a few concrete instances, and then the general framework.
1.1.1 -SUM
In the -SUM problem an input array of numbers is given, and the goal is to decide whether the sum of distinct numbers is [math]. This problem (in particular -SUM) has been extensively studied since the 1990s, as it embeds into many problems in computational geometry, see for example [GO95]. More recently, it has also been studied in the context of fine-grained complexity, see for example the survey [VW15].
The -SUM problem corresponds to the following point-location problem. Let denote all vectors of hamming weight . Thus, contains numbers whose sum is [math] if and only if contains at least one [math] entry.
In this context, comparison decision trees allow for two types of linear queries: label queries of the form “” where has size , and comparison queries of the form “” where have size .
Theorem 1.1**.**
The -SUM problem on elements can be computed by a comparison decision tree of depth . In particular, all the queries are -sparse and have only coefficients.
This improves a series of works. There is a simple algorithm based on hashing that solves -SUM in time . It can be transformed to a linear decision tree with the same number of queries, which in our language are all label queries. Erickson [Eri95] showed that queries are indeed necessary to solve -SUM if only label queries are allowed (or more generally, if only -sparse linear queries are allowed). Ailon and Chazelle [AC05] extended the lower bound, and showed that if the linear queries have sparsity less than , than a super-linear lower bound of holds for the number of queries (note that indeed the near-linear comparison decision tree given by Theorem 1.1 is -sparse).
In a breakthrough work, Grønlund and Pettie [GP14] were the first to break the bound. They constructed a randomized -linear decision tree for -SUM which makes queries. This was improved to by Gold and Sharir [GS15].
In the general linear decision tree model, without any sparsity assumptions, a series of works in discrete geometry have designed linear decision trees for the general point-location problem. In the context of -SUM, the best result is of Ezra and Sharir [ES16], who constructed a linear decision tree of depth for any constant . This improves on previous results of Meyer auf der Heide [MadH84], Meiser [Mei93] and Cardinal et al. [CIO15].
1.1.2 Sorting
Let be sets of size . Their sumset, denoted by is the set . Consider the goal of sorting while minimizing the number of comparisons (here, by comparisons we mean the usual notion in sorting, that is comparing two elements of ). While it is possible that , it is well known that the number of possible orderings of is only [Fre76]. Thus, from an information theoretic perspective it is conceivable that can be sorted using only comparisons. However, Fredman [Fre76] gave a tight bound of on the number of comparisons needed to sort .
It is natural to ask whether enabling the algorithm more access to the data in the form of simple local queries can achieve query-complexity. We show that if the algorithm can use differences-comparisons than an almost optimal query-complexity of suffices to sort . A differences-comparison on an array is a query of the form
[TABLE]
in words: “is greater than more than is greater than ?”.
The problem of sorting corresponds to the following point-location problem. Let and identify with . Let consist of vectors with exactly one and one in the first elements, and exactly one and one in the last elements. Then computing corresponds to answering all queries of the form “” for all , which amounts to sorting . In this context, the two types of queries used by comparison decision trees are comparison queries in , namely “” where (which correspond to the label queries in the point location problem), and differences-comparison queries in , namely “” where (which correspond to comparison queries in the point location problem).
Theorem 1.2**.**
Given of size , their sumset can be sorted by a comparison decision tree of depth . In particular, all queries are -sparse with coefficients.
The problem of sorting sumsets has been considered by Fredman [Fre76], who showed that if only comparison queries are allowed, then queries are sufficient and necessary to sort . Grønlund and Pettie [GP14] use it in their work, and specifically ask for a better linear decision tree for sorting sumsets.
1.1.3 NP-hard problems
Several NP-hard problems can be phrased as point-location problems. For example, the SUBSET-SUM problem is to decide, given a set of real numbers, whether there exists a subset of whose sum is [math]. The KNAPSACK problem is to decide whether there exists a subset of whose sum is . We focus here on SUBSET-SUM for concreteness.
The SUBSET-SUM problem corresponds to the following point-location problem. Let and take . Let . Then has a subset whose sum is [math] if and only if contains at least one [math].
In this context, comparison decision trees have two types of queries: label queries of the form “” for some , and comparison queries of the form “” for some .
Theorem 1.3**.**
The SUBSET-SUM problem can be solved using a comparison decision tree of depth , where is the size of the input-set. In particular, all the queries are linear with coefficients.
Note that the bound is tight up to the log factor: indeed, in the corresponding point-location problem, , and thus corresponds to the family of thresholds function on the boolean cube. It is well known that the number of such functions is [GT62], and thus any decision tree (even one that uses arbitrary queries, each with a constant number of possible answers) that computes must use at least queries.
The surprising fact that SUBSET-SUM, an NP-hard problem, has a polynomial time algorithm in a nonuniform model (namely, linear decision trees) was first discovered by Meyer auf der Heide [MadH84], answering an open problem posed by Dobkin and Lipton [DL74] and Yao [Yao81]. It originally required linear queries. It was generalized by Meiser [Mei93] to the general point-location problem, and later improved by Cardinal [CIO15] and Ezra and Sharir [ES16]. This last work, although it does not address the SUBSET-SUM directly, seems to improves the number of queries to . Observe that our construction gives a near-optimal number of linear queries, namely . Moreover, the queries are simple, in the sense that they involve only coefficients, and natural from a a computational perspective as they only compare the sums of subsets. This is unlike the previous works mentioned, which requires arbitrary coefficients due to the geometric nature of their techniques.
1.1.4 Other applications
Our framework (see Corollary 1.9) is pretty generic, and as such gives near optimal linear decision trees for a host of problems considered in the literature. For example, the following problems were considered in [GP14]. We discuss each one briefly, and refer the interested reader to [GP14] for a deeper discussion.
-LDT.
Given a fixed linear equation and a set of size , the goal is to decide if there exist distinct such that . This problem is a variant of the -SUM problem, and can be embedded as a point-location problem in as follows. Let and consists of which have a “” in their first coordinate, a single “” in each of the blocks of size , and [math] elsewhere. Corollary 1.9 implies a comparison decision tree with queries which are -sparse and with coefficients. For constant this gives , which improves upon the previous best bound of of [ES16].
Zero triangles.
Let be a graph on vertices and edges, which is known in advance (it is not part of the input). The inputs are edge weights . The goal is to decide if there is a triangle in whose sum is zero. This problem clearly embeds as a point-location problem in . Corollary 1.9 gives a comparison decision tree which solves this problem with queries. All the queries are -sparse and have coefficients. This improves upon the previous bound of of [GP14].
1.2 General framework
Our results are based on the notion of “inference dimension”, which was recently introduced by the authors [KLMZ17] in the context of active learning.
Definition 1.4** (Inference).**
Let and . We say that infers at if “” is determined by the answers to the label and comparison queries on . That is, if we set
[TABLE]
then for all . We further define the inference set of at to be
[TABLE]
For each , we refer to as the inferred value of at .
An equivalent geometric condition to “ infers at ” is that the hyperplane defined by is either disjoint from or contains .
For example, if are such that , and is in the linear space spanned by then and so infer at . Similarly, if , and is in the cone spanned by (i.e. for ) then and so infer at .
Definition 1.5** (Inference dimension).**
Let . The inference dimension of is the minimal for which the following holds. For any subset of size , and for any , there exists such that infers at .
We refer the reader to [KLMZ17] for some simple examples and further discussion regarding the inference dimension.
The first step in the proof of Theorem 1.1, Theorem 1.2 and Theorem 1.3, is to show that the sets in the corresponding point location problems are of low inference dimension. The following general theorem provides a uniform treatment for this.
For defines it norm as .
Theorem 1.6**.**
The inference dimension of is .
Next, we show that sets of low inference dimension have efficient comparison decision trees. As a first step, we show this for zero-error randomized comparison decision trees. A zero-error randomized comparison decision tree is a distribution over (deterministic) comparison decision trees , each solves correctly for all inputs. The expected query complexity is the maximum over , of the expected number of queries performed by to compute .
Theorem 1.7**.**
Let be a finite set with inference dimension . Then there exists a zero-error randomized comparison decision tree which computes , whose expected query complexity is {O\bigl{(}(d+n\log d)\log|H|\bigr{)}}.
A slightly weaker version of Theorem 1.7 appears in [KLMZ17] (see Theorem 4.1 there). The next step is to de-randomize Theorem 1.7 and obtain a deterministic comparison decision tree.
Theorem 1.8**.**
Let be a finite set with inference dimension . Then there exists a comparison decision tree which computes , whose query complexity is .
The proof of Theorem 1.8 uses a double-sampling argument, a technique originated in the study of uniform convergence bounds in statistical learning theory [VC71]. The following corollary summarizes the above theorems concisely. For define .
Corollary 1.9**.**
Let be such that for all . Then there exists a comparison decision tree computing whose query complexity is {O\bigl{(}n\log(nw)\log|H|\bigr{)}}.
Proof.
Observe that . By Theorem 1.6, the inference dimension of is . The corollary now follows from Theorem 1.8. ∎
One can now verify that Theorem 1.1, Theorem 1.2 and Theorem 1.3 follow from Corollary 1.9 by setting .
Paper organization.
We begin with some preliminaries in Section 2. We prove Theorem 1.6 in Section 3. We prove Theorem 1.7 in Section 4. We prove Theorem 1.8 in Section 5. We discuss further research and open problems in Section 6.
An acknowledgement.
We thank the Simons institute at Berkeley, where this work was performed, for their hospitality.
2 Preliminaries
Let be a finite set. For every , denotes the function
[TABLE]
where is the sign function and is the standard inner product in . The following lemma is a variant of standard bounds on the number of cells in a hyperplane arrangement.
Lemma 2.1**.**
Let be a set of size . Then \lvert\{\mathcal{A}_{H}(x):x\in\mathbb{R}^{n}\}\bigr{\rvert}\leq(2em)^{n}.
Proof.
It is well known that a set of hyperplanes partitions to at most open cells. The lemma follows by first choosing linearly independent hyperplanes to which belongs, and then applying the above bound to the remaining ones (restricted to a subspace of dimension ). Thus
[TABLE]
where the second equality follows from the identity , where , and the last inequality follows from the well known upper bound . ∎
3 Bounding the inference dimension
We prove Theorem 1.6 in this section.
Theorem 1.6 (restated). The inference dimension of is .
Let be such that for all . We assume where is large enough to be determined later. Fix . We will show that there exists such that infers at .
Partition into \bigl{\{}S_{b}:b\in\{-,0,+\}\bigr{\}}, where
[TABLE]
We will show that if is sufficiently large then infers at for some and . The simplest case is when is large:
Claim 3.1**.**
If then there exists such that infers at . In particular, infers at .
Proof.
Let be distinct elements such that belongs to the linear span of . We claim that infer at . More specifically, we claim that having
- (i)
for , and
- (ii)
imply that . Indeed, by (ii) there exist coefficients ’s such that , and therefore, using (i), it follows that . ∎
Thus, we assume from now on that . We assume without loss of generality that , and show that there is some such that infers at . The other case is analogous. Set and let sorted by
[TABLE]
The idea is to show that some satisfies that is in the cone spanned by the where . Then, a simple argument shows that infers at . The existence of such an is derived by a counting argument that boils down to the following lemma.
Claim 3.2**.**
Assume that . Then there exist , not all zero, such that
[TABLE]
In particular, this holds for with a large enough constant.
Proof.
For any define . Note that , and as since for all , it follows that by the triangle inequality. Let . Next, we bound . We claim that
[TABLE]
To see that, note that there are possible signs for each . The number of patterns for the absolute values is at most the number of ways to express as the sum of nonnegative integers. Equivalently, it is the number of ways of placing balls in bins, which is . We further simplify
[TABLE]
By our assumptions . Thus by the pigeonhole principle there exist distinct for which . The claim follows for . ∎
We assume that with a large enough constant, so that the conditions of Claim 3.2 hold. Let , not all zero, be such that . Let be maximal such that . We may assume that , as otherwise we can negate all of .
Adding to , we obtain that
[TABLE]
where the first equality holds as if , and the second equality holds as .
We claim that infers at , which completes the proof. More specifically, we claim that having
- (i)
,
- (ii)
, where the coefficients for all ,
imply that . Indeed, item (i) implies that , for every , and item (ii) implies that is in the cone spanned by for . Thus, also , which implies, by the left-most inequality of item (ii), that , as required.
4 Zero-error randomized comparison decision tree
We prove Theorem 1.7 in this section.
Theorem 1.7 (restated). Let be a finite set with inference dimension . Then there exists a zero-error randomized comparison decision tree which computes , whose expected query complexity is {O\bigl{(}(d+n\log d)\log|H|\bigr{)}}.
We begin with the following claim. Recall that is the set of which can be inferred from at .
Claim 4.1**.**
Let with inference dimension and . Then for every , there exist such that
[TABLE]
Proof.
We apply the definition of inference dimension iteratively. Fix . Assume that we constructed so far for . Let . As there exist such that infers at . That is, . But as then also . ∎
Lemma 4.2**.**
Let be a finite set with inference dimension . Let be a uniformly chosen subset of size . Then for every ,
[TABLE]
Proof.
Fix . We have
[TABLE]
where are uniformly chosen distinct elements. The inequality “” follows as for any .
Let . By symmetry it holds that
[TABLE]
By Claim 4.1, for any it holds that . Thus,
[TABLE]
∎
We are now in position to describe the algorithm which establishes Theorem 1.7.
**Zero-error randomized comparison decision tree for
**
Input:
Output:
(1)
Initialize: , , for all .
(2)
Repeat while :
(2.1)
Sample uniformly of size .
(2.2)
Query for and sort the using comparison queries.
(2.3)
Compute .
(2.4)
For all , set to be the inferred value of at .
(2.5)
Set .
(2.6)
Set .
(3)
Query for all , and set accordingly.
(4)
Return as the value of .
Analysis.
In order to establish Theorem 1.7, we first show that for every , the algorithm terminates after iterations in expectation. This follows as , which we show by induction on . It clearly holds for . For by Lemma 4.2, if we condition on then
[TABLE]
and hence
[TABLE]
Thus, it remains to bound the number of queries in every round. Observe that the only queries to are in steps (2.2) and (3). In step (3) the algorithm makes at most label queries. In step (2.2), we need to compute for all , which requires label queries; and to compute for all . This can be done in comparison queries by sorting the elements giving some bound on the expected total number of queries.
This bound can be improved using Fredman’s sorting algorithm [Fre76].
Theorem 4.3** ([Fre76]).**
Let be a family of orderings over a set of elements. Then there exists a comparison decision tree that sorts every using at most
[TABLE]
comparisons.
To use Fredman’s algorithm, observe that the ordering, “”, on that is being sorted in the ’th round is defined by the inner product with ,
[TABLE]
The following claim bounds the number of such orderings.
Claim 4.4**.**
Let . Let be the ordering on define by inner product with . Then
[TABLE]
Proof.
Observe that if and only if there are such that . Thus, the number of different orderings is at most the size of , where . Since , Lemma 2.1 implies an upper bound of as claimed. ∎
Thus, by using Fredman’s algorithm we can sort with just comparisons in each round, which gives a total number of
[TABLE]
queries in total.
5 Deterministic comparison decision tree
We prove Theorem 1.8 in this section, which is a de-randomization of Theorem 1.7.
Theorem 1.8 (restated). Let with inference dimension . Then there exists a deterministic comparison decision tree which computes , whose query complexity is .
First, note the following straightforward Corollary of Lemma 4.2.
Corollary 5.1**.**
Let be a finite set with inference dimension . Let be uniformly chosen of size . Then
[TABLE]
Theorem 1.8 follows by establishing a universal set which is good for all .
Lemma 5.2**.**
Let be a finite set with inference dimension . Then there exists of size such that:
[TABLE]
We first argue that Theorem 1.8 follows directly from the existence of such an . The algorithm is a straightforward adaptation of the zero-error randomized comparison algorithm, except that now we use this set which works for all in parallel.
**Deterministic comparison decision tree for
**
Input:
Output:
(1)
Initialize: , , for all . Let as in Lemma 5.2.
(2)
Repeat while :
(2.1)
Pick of size such that
(2.2)
Query for and sort the using comparison queries.
(2.3)
Compute .
(2.4)
For all , set to be the inferred value of at .
(2.5)
Set .
(2.6)
Set .
(3)
Query for all , and set accordingly.
(4)
Return as the value of .
Analysis.
Lemma 5.2 ensures that a set always exist. Thus, for any , the algorithm terminates after rounds. Observe that the only queries to are in steps (2.2) and (3). In step (3) the algorithm makes at most label queries. In step (2.2), we need to compute for all , and to compute for all , which can be done sorting the elements . Using Fredman’s algorithm, this requires many comparisons in each round, which gives a total number of
[TABLE]
queries.
5.1 Proof of Lemma 5.2
Let be a uniform subset of size where . Define the event
[TABLE]
It suffices to prove that to prove the existence of . In fact, as we will see, by choosing sufficiently large constants in the choice of , the probability can be made (say), so a random set would also work.
In order to establish that we use a variant of the double sampling method [VC71] (see also [VC15]). Let be a uniformly chosen subset of size . Define the event
[TABLE]
We bound in two steps. We first show that (i) , and then that (ii) .
Claim 5.3**.**
.
Proof.
For each for which holds fix such that . Then
[TABLE]
The first condition holds with probability one, since and hence . For the second condition, as is a uniformly chosen subset of size , Corollary 5.1 gives
[TABLE]
Thus
[TABLE]
As this holds for every for which holds, we have , which implies the claim. ∎
We next bound the probability of . We will prove that for every fixed ,
[TABLE]
which will conclude the proof. So, fix of size . Let denote the set , and let . Recall that is defined by
[TABLE]
Observe that the set depends only on ; that is, if then . Let be a set that contains one representative from each equivalence class of the relation . Thus we can rephrase the event as
[TABLE]
The advantage of considering is that now we can bound the probability of using a union bound that depends on the (finite) set . More specifically, let
[TABLE]
We thus established the following claim.
Claim 5.4**.**
For every ,
[TABLE]
To conclude, it suffices to upper bound and the probability that for . Lemma 2.1 gives an upper bound on which also bounds ,
[TABLE]
We next bound the probability (over ) that for .
Claim 5.5**.**
Fix of size and fix . Assume that , and let be a uniformly sampled set of size such that . Then
[TABLE]
Proof.
Let . It suffices to bound the probability of the event that . Indeed, if then
[TABLE]
where in the last inequality we used the assumption that .
The set is a uniform subset of of size . By assumption, at most of the elements in are in . By the Chernoff bound, the probability that at least of the sampled elements belong to is thus exponentially small in . This finishes the proof as . ∎
We now conclude the proof.
[TABLE]
as we choose with a large enough hidden constant. Then we also have and
[TABLE]
6 Further research
We prove that many combinatorial point-location problems have near optimal linear decision trees. Moreover, these are comparison decision trees, in which the linear queries are particularly simple: both sparse (in many cases) and have only coefficients. This raises the possibility of having improved algorithms for these problems in other models of computations. To be concrete, we focus on -SUM below, but the same questions can be asked for any other problem of a similar flavor.
Uniform computation.
The most obvious question is whether the existence of a near optimal linear decision tree implies anything about uniform computation. As showed in [GP14], this can lead to log-factor savings. It is very interesting whether greater savings can be achieved. We do not discuss this further here, as this question has been extensively discussed in the literature (see e.g. [VW15]).
Nonuniform computation.
Let be a set of size . It is very easy to “prove” that is a positive instance of -SUM, by demonstrating three elements whose sum is zero. However, it is much less obvious how to prove that is a negative instance of -SUM. This problem was explicitly studied in [CGI*+*16] in the context of nondeterministic ETH. They constructed such a proof which can be verified in time . It seems plausible that our current approach may lead to improved bounds. Thus, we propose the following problem.
Open problem 6.1**.**
Given a set of real numbers no three of which sums to 0. Is there a proof of that fact which can be verified in near-linear time?
-SUM with preprocessing.
Let of size . The -SUM with preprocessing problem allows one to preprocess the set in quadratic time. Then, given any subset , the goal is to solve that -SUM problem on in time significantly faster then . Chan and Lewenstein [CL15] designed such an algorithm, which solves that -SUM problem on any subset in time for some small constant . It is interesting whether our techniques can help improve this to near-linear time.
Open problem 6.2**.**
Given a set of real numbers, can they be preprocessed in time, such that later on, for every subset of the numbers the -SUM problem can be solved in time near-linear in ?
General point-location problem.
It is natural to ask whether the techniques used in this paper, and in particular, the inference-dimension, can be used to improve the state-of-the-art upper bounds for general point location problems. Unfortunately, unless the set of hyperplanes has some combinatorial structure, its inference dimension may be unbounded: in [KLMZ17] we construct examples of whose inference dimension is unbounded. Nevertheless, we conjecture that by generalizing comparison queries (which are linear combinations of two elements in ) to arbitrary linear combinations of two elements from might solve the problem.
Conjecture 6.3**.**
Let . There exists a linear decision tree which computes of depth . Moreover, all the linear queries are in .
Optimal bounds.
We suspect that our analysis can be sharpened to improve the log-factors that separate it from the information theoretical lower bounds. For concreteness, we pose the following conjecture.
Conjecture 6.4**.**
For any there exists a comparison decision tree which computes with many queries. In particular,
- •
-SUM on real numbers can be solved by a -sparse linear decision tree which makes queries.
- •
Sorting , where are sets of real numbers, can be solved by a -sparse linear decision tree which makes queries.
- •
SUBSET-SUM on real numbers can be solved by a linear decision tree which makes queries.
Note that Corollary 1.9 gives a bound of for this problem. So, the goal is to shave the factor.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AC 05] Nir Ailon and Bernard Chazelle. Lower bounds for linear degeneracy testing. Journal of the ACM (JACM) , 52(2):157–171, 2005.
- 2[CGI + 16] Marco L Carmosino, Jiawei Gao, Russell Impagliazzo, Ivan Mihajlin, Ramamohan Paturi, and Stefan Schneider. Nondeterministic extensions of the strong exponential time hypothesis and consequences for non-reducibility. In Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science , pages 261–270. ACM, 2016.
- 3[CIO 15] Jean Cardinal, John Iacono, and Aurélien Ooms. Solving k 𝑘 k -sum using few linear queries. ar Xiv preprint ar Xiv:1512.06678 , 2015.
- 4[CL 15] Timothy M Chan and Moshe Lewenstein. Clustered integer 3sum via additive combinatorics. In Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing , pages 31–40. ACM, 2015.
- 5[DL 74] David Dobkin and Richard J Lipton. On some generalizations of binary search. In Proceedings of the sixth annual ACM symposium on Theory of computing , pages 310–316. ACM, 1974.
- 6[Eri 95] Jeff Erickson. Lower bounds for linear satisfiability problems. In SODA , pages 388–395, 1995.
- 7[ES 16] Esther Ezra and Micha Sharir. The decision tree complexity for k 𝑘 k -sum is at most nearly quadratic. ar Xiv preprint ar Xiv:1607.04336 , 2016.
- 8[Fre 76] Michael L Fredman. How good is the information theory bound in sorting? Theoretical Computer Science , 1(4):355–361, 1976.
