This paper introduces a new family of hypergraphs with high peelability at near-maximal densities, enabling more efficient data structures with reduced memory usage and maintaining fast performance.
Contribution
We construct a novel class of hypergraphs with linear geometry that achieve peelability thresholds beyond standard models, analyzed via an operator on functions and numerical methods.
Findings
01
Hypergraphs are peelable at densities close to 1.
02
Our construction reduces memory in data structures.
We describe a new family of k-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree 2, even when the edge density (number of edges over vertices) is close to 1. In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices of k consecutive segments. Quite surprisingly, the linear geometry allows our graphs to be peeled "from the outside in". The density thresholds fk for peelability of our hypergraphs (f3≈0.918, f4≈0.977, f5≈0.992, ...) are well beyond the corresponding thresholds (c3≈0.818, c4≈0.772, c5≈0.702, ...) of standard k-uniform random hypergraphs. To get a grip on fk, we analyse an idealised peeling process on the random weak…
Tables2
Table 1. Table 1 : The erosion thresholds er k subscript er 𝑘 \mathrm{er}_{k} and peelability thresholds f k subscript 𝑓 𝑘 f_{k} for k 𝑘 k -ary fuse graphs satisfy b k ≤ er k ≤ f k ≤ c k ∗ subscript 𝑏 𝑘 ≤ subscript er 𝑘 ≤ subscript 𝑓 𝑘 ≤ superscript subscript 𝑐 𝑘 b_{k}≤\mathrm{er}_{k}≤f_{k}≤c_{k}^{*} . The values B k subscript 𝐵 𝑘 B_{k} play a role in Section 5 .
3
4
5
6
7
0.9179352469
0.9767692112
0.9924345766
0.9973757381
0.9990561294
0.9179352767
0.9767701649
0.9924383913
0.9973795528
0.9990637588
0.9179353065
0.9767711186
0.9924422067
0.9973833675
0.9990713882
0.917935
0.97677
0.99243
0.99738
0.99906
Table 2. Table 2 : Overheads and average running times per key of various practical retrieval data structures.
\ f^{r}(x)\stackrel{{\scriptstyle r\textrightarrow∞}}{{\longrightarrow}}\begin{cases}0&\text{ if $c<c_{k}∧x∈[0,1]$,}\\
0&\text{ if $c>c_{k}∧x∈[0,ξ₁)$,}\\
ξ₂&\text{ if $c>c_{k}∧x∈(ξ₁,1]$.}\end{cases}\raisebox{-48.36958pt}{\includegraphics[page=1,scale={0.95}]{figure-f-iteration/figure.pdf}
\includegraphics[page=2,scale={0.95}]{figure-f-iteration/figure.pdf}}
\ f^{r}(x)\stackrel{{\scriptstyle r\textrightarrow∞}}{{\longrightarrow}}\begin{cases}0&\text{ if $c<c_{k}∧x∈[0,1]$,}\\
0&\text{ if $c>c_{k}∧x∈[0,ξ₁)$,}\\
ξ₂&\text{ if $c>c_{k}∧x∈(ξ₁,1]$.}\end{cases}\raisebox{-48.36958pt}{\includegraphics[page=1,scale={0.95}]{figure-f-iteration/figure.pdf}
\includegraphics[page=2,scale={0.95}]{figure-f-iteration/figure.pdf}}
Prq(0)(i)=Cor \refcor:approxOfqq(r)(i)±o(1) and Prq(0)≤P^rq(0)≤P^rconst1=constfr(1)⟶r\textrightarrow∞const0.
Prq(0)(i)=Cor \refcor:approxOfqq(r)(i)±o(1) and Prq(0)≤P^rq(0)≤P^rconst1=constfr(1)⟶r\textrightarrow∞const0.
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Technische Universität Ilmenau, [email protected]://orcid.org/0000-0001-5484-3474Technische Universität Ilmenau, [email protected]://orcid.org/0000-0002-6477-0106
\CopyrightMartin Dietzfelbinger and Stefan Walzer\ccsdesc[500]Theory of computation Data structures design and analysis
\supplement\funding
Acknowledgements.
\relatedversionThis paper will be presented at the European Symposium on Algorithms 2019.
\hideLIPIcs
Dense Peelable Random Uniform Hypergraphs
Martin Dietzfelbinger
Stefan Walzer
Abstract
We describe a new family of k-uniform hypergraphs with independent random edges. The hypergraphs have a high probability of being peelable, i.e. to admit no sub-hypergraph of minimum degree 2, even when the edge density (number of edges over vertices) is close to 1.
In our construction, the vertex set is partitioned into linearly arranged segments and each edge is incident to random vertices of k consecutive segments. Quite surprisingly, the linear geometry allows our graphs to be peeled “from the outside in”. The density thresholds fk for peelability of our hypergraphs (f3≈0.918, f4≈0.977, f5≈0.992, …) are well beyond the corresponding thresholds (c3≈0.818, c4≈0.772, c5≈0.702, …) of standard k-uniform random hypergraphs.
To get a grip on fk, we analyse an idealised peeling process on the random weak limit of our hypergraph family. The process can be described in terms of an operator on [0,1]Z and fk can be linked to thresholds relating to the operator. These thresholds are then tractable with numerical methods.
Random hypergraphs underlie the construction of various data structures based on hashing, for instance invertible Bloom filters, perfect hash functions, retrieval data structures, error correcting codes and cuckoo hash tables, where inputs are mapped to edges using hash functions.
The data structures frequently rely on peelability of the hypergraph or peelability allows for simple linear time algorithms. Memory efficiency is closely tied to edge density while worst and average case query times are tied to maximum and average edge size.
To demonstrate the usefulness of our construction, we used our 3-uniform hypergraphs as a drop-in replacement for the standard 3-uniform hypergraphs in a retrieval data structure by Botelho et al [8]. This reduces memory usage from 1.23m bits to 1.12m bits (m being the input size) with almost no change in running time. Using k>3 attains, at small sacrifices in running time, further improvements to memory usage.
keywords:
Random Hypergraphs, Peeling Threshold, 2-Core, Hashing, Retrieval, Succinct Data Structure, Linear Time Algorithm.
category:
\relatedversion
1 Introduction
The core of a hypergraph H=(V,E) is the largest sub-hypergraph of H with minimum degree at least 2. The core can be obtained by peeling, which means repeatedly choosing a vertex of degree [math] or 1 and removing it (and the incident edge if present) from the hypergraph, until no such vertex exists. If the core of H is empty, then H is called peelable.
The significance of peelability.
Hypergraphs underlie many hashing based data structures and peelability is often necessary for proper operation or allows for simple linear time algorithms. We list a few examples.
•
Invertible Bloom Lookup Tables. IBLTs [23] are based on Bloomier filters [10] which are based on Bloom filters [4]. Each element is inserted in several random positions in a hash table. Any cell stores the xor of all elements that have been inserted into it. A List-Entries query on an IBLT can recover all elements of the table precisely if the underlying hypergraph is peelable. Among other things, IBLTs have been used to construct error correcting codes [35] and to solve the set reconciliation and straggler identification problems [17].
•
Erasure Correcting Codes. To construct capacity achieving erasure codes, the authors of [29] consider a hypergraph where V corresponds to parity check bits and E to message bits that were lost during transmission. A message bit is incident to precisely those check bits to which it contributed. Correct decoding hinges on peelability of the hypergraph.
•
Cuckoo Hashing and XORSAT. In the context of cuckoo hash tables [15, 32, 37] and solving random xorsat formulas [16, 20, 38], (partial) peelability of the underlying hypergraph makes placing all (some) keys or solving the linear system (eliminating some variables) particularly simple.
•
Retrieval and Perfect Hashing. The retrieval problem (considered later in Section 7) occurs in the context of constructing perfect hash functions [3, 6, 7, 8, 31]. The known approaches involve finding a solution z:V\textrightarrowR for a system (∑v∈ez(v)=f(e))e∈E of equations where H=(V,E) is a hypergraph, f:E\textrightarrowR a function and R a small set. If R is a field, then the incidence matrix of H needs to have full rank over R to guarantee the existence of a solution. If H is peelable however, then the existence of a solution is guaranteed even if R only has a group structure. Moreover, it can be computed in linear time.
In these contexts, the hypergraph typically has vertex set [n]={1,…,n} and for each element x of an input set S, an edge ex⊂[n] is created with incidences chosen via hash functions. For theoretical considerations, the edges (ex)x∈S are often assumed to be independent random variables. This has proven to be a good model for practical settings, even though perfect independence is not achieved by most practical hash functions. An important choice left to the algorithm designer is the distribution of ex.
Previous work.
If the distribution is such that O(n) edges have degree 2 or less (in particular if H is a graph with O(n) edges), then – due to the well-known “birthday paradox” – there is a constant probability that an edge is repeated. In that case, H is clearly not peelable. The simplest workable candidate for the distribution of ex is therefore to pick a constant k≥3 and let ex contain k vertices chosen independently and uniformly at random. We refer to these standard hypergraphs as k-uniform Erdős-Renyi hypergraphsHn,cnk where c is the edge density, i.e. the number of edges over the number of vertices. Corresponding peelability thresholdsck have been determined in [36] meaning if c<ck then Hn,cnk is peelable with high probability (whp), i.e. with probability approaching 1 as n\textrightarrow∞ and if c>ck then Hn,cnk is not peelable whp. The largest threshold is c3≈0.818. Since the edge density is often tightly linked to a performance metric (e.g. memory efficiency of a dictionary, rate of a code) a density closer to 1 would be desirable, but we know of only two alternative constructions.
To obtain erasure codes with high rates the authors of [29] construct for any D∈N hypergraphs with edge sizes in {5,…,D+4}, average edge size ≈lnD+3 and edge density 1−1/D that are peelable whp. In particular, this yields peelable hypergraphs with edge densities arbitrarily close to 1. A downside is that the high maximum edge size can lead to worst case query times of Θ(D) in certain contexts. Motivated by this, the author of [40] looked into non-uniform hypergraphs with constant maximum edge size. Focusing on hypergraphs with two admissible edge sizes, he found for example that mixing edges of size 3 and size 21 yields a family of hypergraphs with peelability threshold ≈0.92.
Our construction.
In this paper we introduce and analyse a new distribution on edges that yields k-uniform hypergraphs with high peelability thresholds that perform well in practical algorithms.
We call our hypergraphs fuse graphs (as in the cord attached to a firecracker). There is an underlying linear geometry and similar to how fire proceeds linearly through a lit fuse,
the peeling process proceeds linearly through our hypergraphs, in the sense that vertices on the inside of the line tend to only become peelable after vertices closer to the end of the line have already been removed.
Formally, for k≥3, ℓ∈N and c∈R+ we define the family (F(n,k,c,ℓ))n∈N of k-uniform fuse graphs as follows. The vertex set is V={1,…,n(ℓ+k−1)} where for i∈I:={0,…,ℓ+k−2} the vertices {in+1,…,(i+1)n} form the i-th segment111Denoting the segment size by n instead of the number of vertices is more convenient. Note that ∣V∣=Θ(n) still holds.. The edge set E has size cnℓ. Each edge e∈E is independently determined by one uniformly random variable j∈J:={0,…,ℓ−1} denoting the type of e and k independent random variables o0,…,ok−1 uniformly distributed in [n], yielding e={(j+t)n+ot∣t∈{0,…,k−1}}. In other words, e contains one uniformly random vertex from each segment j,j+1,…,j+k−1. There may be repeating edges but the probability that his happens is O(1/n). The edge density cℓ+k−1ℓ approaches c for ℓ≫k.
Results.
Let the peelability threshold for k-ary fuse graphs be defined as
[TABLE]
Our Main Theorem relates fk to the orientability thresholdck∗ of k-ary Erdős-Renyi hypergraphs and the erosion thresholderk defined in the technical part of our paper.
Theorem 1.1**.**
For any k≥3 we have erk≤fk≤ck∗.
The orientability thresholds ck∗ are known exactly [11, 20, 21] and we determine lower bounds on the erosion thresholds erk. As shown in Table 1, this makes it possible to narrow down fk to an interval of size 10−5 for all k∈{3,…,7}.
Outline.
The paper is organised as follows. In Section 2 we idealise the peeling process by switching to the random weak limit of our hypergraphs, and capture the essential behaviour of the process in terms of an operator P^ acting on functions q:Z\textrightarrow[0,1]. For this operator, we identify the properties of being eroding and consolidating as well as corresponding thresholds erk and cok in Section 3. We then prove the “erk≤fk” part of our theorem in Section 4 and give numerical approximations of erk and cok in Section 5. The comparatively simple “fk≤ck∗” part of our theorem is independent of these considerations and is proved in Section 6. Finally, in Section 7 we demonstrate how using our hypergraphs can improve the performance of practical retrieval data structures.
2 The Peeling Process and Idealised Peeling Operators
In this section we consider how the probabilities for vertices to “survive” r∈N rounds of peeling changes from one round to the next. In the classical setting this could be described by a function, mapping the old probability to the new one [36]. In our case, however, there are distinct probabilities for each segment of the graph. Thus we need a corresponding operator P^ that acts on sequences of probabilities. Conveniently, it will be independent of n and ℓ.
We almost always suppress n,k,c,ℓ in notation outside of definitions, assuming n to be large. Big-O notation refers to n\textrightarrow∞ while k,c,ℓ are constant.
Consider the parallel peeling process peel(F) on F=F(n,k,c,ℓ). In each round of peel(F), all vertices of degree [math] or 1 are determined and then deleted simultaneously. Deleting a vertex implicitly deletes incident edges. We also define the rooted peeling processpeelv(F) for any vertex v∈V, which behaves exactly like peel(F) except that the special vertex v may only be deleted if it has degree [math], not if it has degree 1. For any i∈I and r∈N0 we let q(r)(i)=q(r)(i,n,k,c,ℓ) be the probability that a vertex v of segment i survives r rounds of peelv(F), i.e. is not deleted. Note that the probability is well-defined as vertices of the same segment are symmetric.
By definition, q(0)(i)=1 for all i∈I. Whether a vertex v of segment i∈I survives r>0 rounds is a function of its r-neighbourhood N(n,v,r), i.e. the set of vertices and edges of F that can be reached from v by traversing at most r hyperedges.
It is standard to consider the random weak limit of F to get a grip on the distribution of N(n,v,r) and thus on q(r)(i). Intuitively, we identify a (possibly infinite) random tree that captures the local characteristics of F for n\textrightarrow∞. See [1] for a good survey with examples and details on how to formally define the underlying topology and metric space.
In the limit, the binomially distributed vertex degrees (e.g. Bin(cnℓ,nℓ1) for vertices of segment [math]) become Poisson distributed (Po(c) for segment [math]). Short cycles are not only rare but non-existent and certain weakly correlated random variables become perfectly independent.
Definition 2.1** (Limiting Tree).**
Let k,ℓ∈N, c∈R+ and i∈I. The random (possibly infinite) hypertree Ti=Ti(k,c,ℓ) is distributed as follows.
Ti* has a root vertex root(Ti) of segment222In the current context, the segment of a vertex is an abstract label. There can be an unbounded number of vertices of each segment. i which, for each j∈{i−k+1,…,i}∩J, has dj∼Po(c)child edges of type j. Each child edge of type j is incident to k−1 (fresh) child vertices of its own, one for each segment i′∈{j,…,j+k−1}∖{i}. The sub-hypertree at such a child vertex of segment i′ is distributed recursively (and independently of its sibling-subtrees) according to Ti′.*
Since all arguments are standard in contexts where local weak convergence plays a role, we state the following lemma without proof. For instance, a full argument to show a similar convergence is given in [26]. See also [25] for the related technique of Poissonisation.
Lemma 2.2**.**
Let r∈N be constant. Let further N(n,v,r) be the r-neighbourhood of a vertex v of segment i in F and Ti(r) the r-neighbourhood of root(Ti), both viewed as undirected and unlabelled hypergraphs. Then N(n,v,r) converges in distribution to Ti(r) as n\textrightarrow∞.
We now direct our attention to survival probabilities in the idealised peeling processes (peelroot(Ti)(Ti))i∈I, which are easier to analyse than those of peelv(F).
Lemma 2.3**.**
Let r∈N0 be constant and qT(r)(i)=qT(r)(i,k,c,ℓ) be the probability that root(Ti) survives r rounds of peelroot(Ti)(Ti) for i∈I. Then
[TABLE]
Proof 2.4**.**
Let i∈I and v=root(Ti). Assume j∈{i−k+1,…,i}∩J is the type of some edge e incident to v. Edge e survives r rounds of peelv(Ti) if and only if all of its incident vertices survive these r rounds. Since v itself may not be deleted by peelv(Ti) as long as e exists, the relevant vertices are the k−1 child vertices, one for each segment i′∈{j,…,j+k−1}−{i}. Call these w1,…,wk−1 and denote the subtrees rooted at those vertices by W1,…,Wk−1. Now consider the peeling processes peelw1(W1),…,peelwk−1(Wk−1). Assume one of them, say peelws(Ws), deletes ws in round r′≤r, meaning ws has degree [math] before round r′. It follows that ws has degree at most 1 before round r′ in peelv(Ti), meaning peelv(Ti) deletes e in round r′ (or earlier). Conversely, if none of peelw1(W1),…,peelwk−1(Wk−1) delete their root vertex within r rounds, then w1,…,wk−1 have degree at least 2 after round r of peelv(Ti) and e survives round r of peelv(Ti). This makes the probability for e to survive r rounds of peelv(Ti) equal to pij:=∏j≤i′<j+k,i′=iqT(r)(i′). Since the number mij of edges of type j incident to v has distribution mij∼Po(c), the number mij′ of edges of type j incident to v surviving r rounds of peelv(Ti) is a correspondingly thinned out variable, namely mij′∼Bin(mij,pij), which means mij′∼Po(cpij).
The claim now follows by observing that v survives r+1 rounds of peelv(Ti) if and only if at least one of its child edges survives r rounds of peelv(Ti):
[TABLE]
Replacing pij with its definition completes the proof.
For convenience we define, for k≥3,ℓ∈N and c∈R+, the operator P=P(k,c,ℓ), which maps any q:I\textrightarrow[0,1] to Pq:I\textrightarrow[0,1] with
[TABLE]
Together Lemmas 2.2 and 2.3 imply that P can be used to approximate survival probabilities.
Corollary 2.5**.**
Let r∈N0 be constant. Then for all i∈I
[TABLE]
To obtain upper bounds on survival probabilities, we may remove the awkward restriction “∩J” in the definition of P. We define P^=P^(k,c) as mapping q:Z\textrightarrow[0,1] to P^q:Z\textrightarrow[0,1] with
[TABLE]
Note that P^ does not depend on ℓ or n. To simplify notation, we assume that the old operator P also acts on functions q:Z\textrightarrow[0,1], ignoring q(i) for i∈/I, and producing Pq:Z\textrightarrow[0,1] with Pq(i)=0 for i∈/I. We also extend q(0) to be 𝟙I:Z\textrightarrow[0,1], i.e. the characteristic function on I, essentially introducing vertices of segments i∈/I which are, however, already deleted with probability 1 before the first round begins.
Note that while q(r)(i) and qT(r)(i) are by definition non-increasing in r, this is not the case for (P^rq(0))(i). For instance, P^rq(0) has support {0−r,…,ℓ+k−2+r}, which grows with r.333It is still possible to interpret P^rq(0)(i) as survival probabilities in more symmetric extended versions T^i of the tree Ti, but we will not pursue this.
The following lemma lists a few easily verified properties of P^. All inequalities between functions should be interpreted point-wise.
Lemma 2.6**.**
• ∀q:Z\textrightarrow[0,1]:Pq≤P^q.
• P^ commutes with the shift operators \leavevmodeto7.17pt\vboxto8.03pt\pgfpicture\makeatletter\lower-0.59999ptto0.0pt\pgfsys@beginscope\pgfsys@invoke\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke\pgfsys@color@rgb@fill000\pgfsys@invoke\pgfsys@setlinewidth0.4pt\pgfsys@invoke\nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke\pgfsys@roundcap\pgfsys@invoke\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto0.0pt0.0pt\pgfsys@lineto-3.41418pt3.41418pt\pgfsys@lineto0.0pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto2.56064pt0.0pt\pgfsys@lineto-0.85355pt3.41418pt\pgfsys@lineto2.56064pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture and \leavevmodeto7.17pt\vboxto8.03pt\pgfpicture\makeatletter\lower-0.59999ptto0.0pt\pgfsys@beginscope\pgfsys@invoke\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke\pgfsys@color@rgb@fill000\pgfsys@invoke\pgfsys@setlinewidth0.4pt\pgfsys@invoke\nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke\pgfsys@roundcap\pgfsys@invoke\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto0.0pt0.0pt\pgfsys@lineto3.41418pt3.41418pt\pgfsys@lineto0.0pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto-2.56064pt0.0pt\pgfsys@lineto0.85355pt3.41418pt\pgfsys@lineto-2.56064pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture defined via (\leavevmodeto7.17pt\vboxto8.03pt\pgfpicture\makeatletter\lower-0.59999ptto0.0pt\pgfsys@beginscope\pgfsys@invoke\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke\pgfsys@color@rgb@fill000\pgfsys@invoke\pgfsys@setlinewidth0.4pt\pgfsys@invoke\nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke\pgfsys@roundcap\pgfsys@invoke\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto0.0pt0.0pt\pgfsys@lineto-3.41418pt3.41418pt\pgfsys@lineto0.0pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto2.56064pt0.0pt\pgfsys@lineto-0.85355pt3.41418pt\pgfsys@lineto2.56064pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpictureq)(i)=q(i+1) and (\leavevmodeto7.17pt\vboxto8.03pt\pgfpicture\makeatletter\lower-0.59999ptto0.0pt\pgfsys@beginscope\pgfsys@invoke\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke\pgfsys@color@rgb@fill000\pgfsys@invoke\pgfsys@setlinewidth0.4pt\pgfsys@invoke\nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke\pgfsys@roundcap\pgfsys@invoke\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto0.0pt0.0pt\pgfsys@lineto3.41418pt3.41418pt\pgfsys@lineto0.0pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto-2.56064pt0.0pt\pgfsys@lineto0.85355pt3.41418pt\pgfsys@lineto-2.56064pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpictureq)(i)=q(i−1). In other words, we have ∀q:Z\textrightarrow[0,1]:P^(\leavevmodeto7.17pt\vboxto8.03pt\pgfpicture\makeatletter\lower-0.59999ptto0.0pt\pgfsys@beginscope\pgfsys@invoke\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke\pgfsys@color@rgb@fill000\pgfsys@invoke\pgfsys@setlinewidth0.4pt\pgfsys@invoke\nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke\pgfsys@roundcap\pgfsys@invoke\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto0.0pt0.0pt\pgfsys@lineto-3.41418pt3.41418pt\pgfsys@lineto0.0pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto2.56064pt0.0pt\pgfsys@lineto-0.85355pt3.41418pt\pgfsys@lineto2.56064pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpictureq)=\leavevmodeto7.17pt\vboxto8.03pt\pgfpicture\makeatletter\lower-0.59999ptto0.0pt\pgfsys@beginscope\pgfsys@invoke\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke\pgfsys@color@rgb@fill000\pgfsys@invoke\pgfsys@setlinewidth0.4pt\pgfsys@invoke\nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke\pgfsys@roundcap\pgfsys@invoke\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto0.0pt0.0pt\pgfsys@lineto-3.41418pt3.41418pt\pgfsys@lineto0.0pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto2.56064pt0.0pt\pgfsys@lineto-0.85355pt3.41418pt\pgfsys@lineto2.56064pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture(P^q)∧P^(\leavevmodeto7.17pt\vboxto8.03pt\pgfpicture\makeatletter\lower-0.59999ptto0.0pt\pgfsys@beginscope\pgfsys@invoke\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke\pgfsys@color@rgb@fill000\pgfsys@invoke\pgfsys@setlinewidth0.4pt\pgfsys@invoke\nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke\pgfsys@roundcap\pgfsys@invoke\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto0.0pt0.0pt\pgfsys@lineto3.41418pt3.41418pt\pgfsys@lineto0.0pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto-2.56064pt0.0pt\pgfsys@lineto0.85355pt3.41418pt\pgfsys@lineto-2.56064pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpictureq)=\leavevmodeto7.17pt\vboxto8.03pt\pgfpicture\makeatletter\lower-0.59999ptto0.0pt\pgfsys@beginscope\pgfsys@invoke\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke\pgfsys@color@rgb@fill000\pgfsys@invoke\pgfsys@setlinewidth0.4pt\pgfsys@invoke\nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke\pgfsys@roundcap\pgfsys@invoke\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto0.0pt0.0pt\pgfsys@lineto3.41418pt3.41418pt\pgfsys@lineto0.0pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto-2.56064pt0.0pt\pgfsys@lineto0.85355pt3.41418pt\pgfsys@lineto-2.56064pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture(P^q).
• P^ is monotonic, i.e. ∀q,q′:Z\textrightarrow[0,1]:q≤q′⇒P^q≤P^q′.
• P^ respects monotonicity, i.e. if q(i) is (strictly) increasing in i, then so is (P^q)(i).
3 Two Fixed Points Battling for Territory
In this section we define the erosion and consolidation thresholds at which the behaviour of P^ changes in crucial ways.
First, we require a few facts about the function f:[0,1]\textrightarrow[0,1] mapping x↦1−e−ckxk−1. It appears in the analysis of cores in k-ary Erdős-Renyi hypergraphs Hn,cnk, essentially mapping the probability ρr for a vertex to survive r rounds of peeling to the probability ρr+1=f(ρr) to survive r+1 rounds of peeling, see [36, page 5]444Our setting corresponds to the choices (rMolloy,kMolloy,cMolloy)=(k,2,c⋅(k−1)!).
The threshold ck for the appearance of a core in Hn,cnk turns out to be the threshold for the appearance of a non-zero fixed point of f. The following is implicit in the analysis.
• For c<ck, f has only the fixed point f(0)=0, with f′(0)<1.
• For c>ck, there are exactly three fixed points [math], ξ1=ξ1(k,c) and ξ2=ξ2(k,c) where f′(ξ1)>1 while f′(0),f′(ξ2)<1.
This implies the following behaviour of applying f repeatedly to a starting value x. This should be immediately clear from the sketches we give on the right.
[TABLE]
Note that f captures the behaviour of P^ on constant functions constx(i):=x, in the sense that P^constx=constf(x). For c<ck we therefore have for all i∈I
[TABLE]
In conjunction with a later lemma, this is sufficient to show that F is peelable whp in this case. A similar argument for c=ck is possible as well. Our focus from now on is therefore on the interesting case c>ck where the three distinct fixed points [math], ξ1, ξ2 of f exist.
We give an intuitive account of the phenomenon underlying the following steps before continuing formally. Due to (1) we have
[TABLE]
Now consider what happens if we iterate P^ on a function that is “torn” between these two cases. Concretely, let us consider the function step01 where we define stepxy:Z\textrightarrow[0,1] to have value y on N0 and value x on negative inputs.
Should we expect P^rstep01 to converge to const0 or constξ2 as r increases? It turns out both is possible, depending on c.
Speaking more generally, let q:Z\textrightarrow[0,1] be any function. If N(i):={i−k+1,…,i+k−1}∖{i} then P^q(i) depends (monotonically) on (q(i′))i′∈N(i). It is clear that if q(i′)<ξ1 for all i′∈N(i), then P^q(i)<ξ1 as well. Similarly, if q(i′)>ξ1 for all i′∈N(i) then P^q(i)>ξ1. If, however, there are indices i1′,i2′∈N(i) with q(i1′)<ξ1<q(i2′) then P^q(i) could be above or below ξ1; in this case we call the index icontested.
The contested area of step01 is [−k+1,k−2]. Iterating P^ we obtain P^rstep01 for r∈N0. For all r∈N0 the contested area is an interval of size 2k−2 with all values to the left of it (towards −∞) less than ξ1 and all values to the right of it (towards ∞) bigger than ξ1. However, the contested area may shift. If the domain of values bigger than ξ1 is shrinking (“eroding”), then we see convergence to const0. If conversely it is growing (“consolidating”), then we see convergence to constξ2. In Figure 1 we visualise these effects. There is only a small range of values c where both fixed points seem equally “strong” and the same area remains perpetually contested.
With this in mind, we make the following definitions. For a compact formulation in the coarse terms of shifts (“\leavevmodeto7.17pt\vboxto8.03pt\pgfpicture\makeatletter\lower-0.59999ptto0.0pt\pgfsys@beginscope\pgfsys@invoke\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke\pgfsys@color@rgb@fill000\pgfsys@invoke\pgfsys@setlinewidth0.4pt\pgfsys@invoke\nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke\pgfsys@roundcap\pgfsys@invoke\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto0.0pt0.0pt\pgfsys@lineto-3.41418pt3.41418pt\pgfsys@lineto0.0pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto2.56064pt0.0pt\pgfsys@lineto-0.85355pt3.41418pt\pgfsys@lineto2.56064pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture”, “\leavevmodeto7.17pt\vboxto8.03pt\pgfpicture\makeatletter\lower-0.59999ptto0.0pt\pgfsys@beginscope\pgfsys@invoke\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke\pgfsys@color@rgb@fill000\pgfsys@invoke\pgfsys@setlinewidth0.4pt\pgfsys@invoke\nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke\pgfsys@roundcap\pgfsys@invoke\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto0.0pt0.0pt\pgfsys@lineto3.41418pt3.41418pt\pgfsys@lineto0.0pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto-2.56064pt0.0pt\pgfsys@lineto0.85355pt3.41418pt\pgfsys@lineto-2.56064pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicture”) and point-wise inequalities (“<”, “>”) we use slightly different step functions.
Definition 3.1**.**
Let k≥3, c∈R+ and P^=P^(k,c) as above. We say
[TABLE]
We define the corresponding erosion and consolidation thresholds as
[TABLE]
Note that c<erk implies that P^(k,c) is eroding and c>cok implies P^(k,c) is consolidating as would be expected. This uses that the definition of P^ is monotonic in c.
The following lemma states that erosion (consolidation) are sufficient conditions for const0 (constξ2) to “win the battle” when iterating P^ on step01.
Lemma 3.2**.**
Let k≥3.
• If c<erk and i∈Z, then P^rstep01(i)⟶r\textrightarrow∞0.
• If c>cok and i∈Z, then P^rstep01(i)⟶r\textrightarrow∞ξ2.
• erk≤cok.
Proof 3.3**.**
• Let R∈N be the witness to the fact that P^(k,c) is eroding and i∈Z arbitrary.
[TABLE]
When replacing stepξ1/21 by constξ1/2 we exploited that (P^q)(i) depends only on the values q(i′) for i′∈{i−k+1,…,i+k−1} and thus (P^rq)(i) depends only on the values q(i′) for i′∈[i−(k−1)r,i+(k−1)r].
• The proof is analogous to the proof of (3.2).
• This is clear, since the implications of (3.2) and (3.2) are mutually exclusive.
4 Erosion is Sufficient for Peeling
We now connect the phenomenon of erosion to the survival probabilities q(R)(i) we were originally interested in. For c<erk and any ℓ∈N, they can be made smaller than any δ>0 in R=R(δ,ℓ) rounds. For c>cok and ℓ sufficiently large, no constant number of rounds suffices to reduce all survival probabilities below ξ1.
Lemma 4.1**.**
Let k≥3.
• If c<erk then ∀ℓ∈N,δ>0:∃R,N∈N:∀n≥N,i∈I:q(R)(i)<δ.
• If c>cok then ∃L=L(k,c):∀ℓ≥L:∃i∈I:r\textrightarrow∞limn\textrightarrow∞limq(r)(i)>ξ1.
Proof 4.2**.**
• Let ℓ∈N and δ>0 be arbitrary constants. Using (3.2) from Lemma 3.2, there exists a constant R such that P^Rstep01(i)≤δ/2 for all i∈I. Therefore for i∈I:
[TABLE]
which implies the existence of an appropriate N∈N.
• Let R∈N be the witness to the fact that P^(k,c) is consolidating and let ℓ≥L(k,c):=4d for d=(k−1)R. Consider the function q∗:Z\textrightarrow[0,1] defined as q∗=𝟙{d,…,ℓ−d−1}⋅(ξ1+ξ2)/2, i.e. the function with value (ξ1+ξ2)/2 on its support {d,…,ℓ−d−1}. For any d≤i<ℓ−2d we have
[TABLE]
For the first equality, we exploited that i is so far from the borders of I={0,…,ℓ−1} that there is no difference between P and P^. For the second equality we used that only the values of q∗ on {i−d,…,i+d} play a role and q∗ is a (shifted) step function on that domain. By mirroring, the same argument can be made to get PRq∗(i)≥q∗(i) for 2d≤i<ℓ−d as well and thus the point-wise inequality PRq∗≥q∗. Since q(0)≥q∗ we get
[TABLE]
Since q∗ exceeds ξ1 on {d,…,ℓ−d−1}, this implies the claim.
While Lemma 4.1(4.1) is sufficient to show that that all but a δ-fraction of the vertices is peeled whp if c<erk, we still need the following combinatorial argument that shows that whp no non-empty core is contained within the remaining vertices. Arguments such as these are standard, many similar ones can be found for instance in [19, 20, 24, 28, 30, 36, 33].
Lemma 4.3**.**
For any k≥3, ℓ∈N and c∈(0,1) there exists δ=δ(k,ℓ)>0 such that the following holds whp. For any non-empty set V′⊆V of at most δ∣V∣ vertices of F=(V,E), there exists v∈V′ of degree at most 1 in the sub-hypergraph of H induced by V′.
Proof 4.4**.**
In the course of the proof we will implicitely encounter positive upper bounds on δ in terms of k and ℓ. Any δ>0 small enough to respect these bounds is suitable. We consider the events (Ws,t)k≤s≤δ∣V∣,k2s≤t≤∣E∣ that some small set V′ of size s induces t edges. If none of the events occur, then all such V′ induce less than 2∣V′∣/k edges and therefore induce hypergraphs with average degree less than 2, so a vertex of degree at most 1 exists in each of them.
It is thus sufficient to show that Pr[⋃s⋃tWs,t]=O(1/n). We shall use a first moment argument.
First note that F has duplicate edges with probability (2cnℓ)(ℓnk)−1=O(n−1), so we restrict our attention to F without duplicate edges.
Given s and t there are (s(ℓ+k−1)n) ways to choose V′ and at most (tsk) ways to choose which k-tuples of vertices in V′ induce an edge. The probability that any given k-tuple actually does induce an edge is either zero if the k vertices are not of consecutive segments or 1−(1−(ℓnk)−1)cnℓ≤nkcn=nk−11. Thus, using constants C,C′,C′′,C′′′∈R+ (that may depend on k and ℓ) where precise values do not matter, we get
[TABLE]
To get rid of the summation over t, we assumed (s/n)k−1≤δk−1≤2C′1. Elementary arguments show that in the resulting bound, the contribution of summands for s∈{k,…,2k} is of order O(n1), the contribution of the summands with s∈{2k+1,…,O(logn)} are of order O(n2logn) (using ns≤nlogn) and the contribution of the remaining terms with s≥3log2n is of order O(2−log2n)=O(n1) (using C′′′ns≤C′′′δ(ℓ+2)≤21).
This gives Pr[⋃s,tWs,t]=O(n−1), proving the claim.
We are ready to prove the “erk≤fk” of Theorem 1.1, stated here as a theorem of its own.
Theorem 4.5**.**
For all k≥3 we have erk≤fk.
Proof 4.6**.**
We need to prove that for any c<erk and any ℓ∈N the fuse graph F=F(n,k,c,ℓ) is peelable whp.
First, let δ=δ(k,ℓ) be the constant from Lemma 4.3 and R=R(δ/2,ℓ) as well as N=N(δ/2,ℓ) the corresponding constants from Lemma 4.1(4.1).
Assuming n≥N we have q(R)(i)≤δ/2 for all i∈I, meaning any vertex v from F is not deleted within R rounds of peelv(F) with probability at most δ/2. Since peel(F) deletes at least the vertices that any peelv(F) for v∈V deletes, the expected number of vertices not deleted by peel(F) within R rounds is at most δ∣V∣/2.
Now standard arguments using Azuma’s inequality (see e.g. [34, Theorem 13.7]) suffice to conclude that whp at most δ∣V∣ vertices are not deleted by peel(F) within R rounds.
By Lemma 4.3 whp neither the remaining δ∣V∣ vertices, nor any of its subsets induces a hypergraph of minimum degree 2. Therefore the core of F is empty.
A natural follow-up question to Theorem 4.5 would be whether erk=fk, which would also imply fk≤cok. To establish this stronger claim, we would have to exclude the possibility that for certain densities c there is function r(n)=ω(1) such that a constant fraction of vertices survive r(n) rounds but are nevertheless deleted eventually. It seems plausible that arguments similar to [36, Lemma 4] can be used, but since our main goal is reached we do not pursue this now.
5 Approximating the Erosion and Consolidation Thresholds
We now approximate the thresholds erk (and analogously cok) with numerical methods. Note that if c<erk (if c>cok), then this can be verified in a finite computation, because the correct value of R, together with a bound on the required precision of floating point operations (when rounding conservatively), constitutes a witness. Moreover, the function P^rstepξ1/21 can be represented by a finite number of reals, since it is constant on (−∞,−(k−1)r] and constant on [(k−1)r,∞).
To approximate erk (and cok) with high precision, more efficient approaches are required, however. We compute upper bounds on P^rstepξ1/21 by focusing on a finite domain [−D,D] for some D∈N and rounding conservatively outside of it. Concretely we define (ar:Z\textrightarrow[0,1])r∈N0 (dependent on k, c and D) with a0:=stepξ1/21 (analogously (br:Z\textrightarrow[0,1])r∈N0 with b0:=step0(ξ1+ξ2)/2). For r≥0 we let
[TABLE]
Due to the limited effective domain, each ar is given by 2D+2 values. It is easy to see that each ar is monotonous and fulfils ar+1≤P^ar, which implies P^rstepξ1/21≤ar. If we find ar(0)<ξ1/2, then by monotonicity we have ar≤\leavevmodeto7.17pt\vboxto8.03pt\pgfpicture\makeatletter\lower-0.59999ptto0.0pt\pgfsys@beginscope\pgfsys@invoke\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@rgb@stroke000\pgfsys@invoke\pgfsys@color@rgb@fill000\pgfsys@invoke\pgfsys@setlinewidth0.4pt\pgfsys@invoke\nullfontto0.0pt\pgfsys@beginscope\pgfsys@invoke\pgfsys@roundcap\pgfsys@invoke\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto0.0pt0.0pt\pgfsys@lineto3.41418pt3.41418pt\pgfsys@lineto0.0pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@beginscope\pgfsys@invoke\pgfsys@setlinewidth1.2pt\pgfsys@invoke\pgfsys@moveto-2.56064pt0.0pt\pgfsys@lineto0.85355pt3.41418pt\pgfsys@lineto-2.56064pt6.82837pt\pgfsys@stroke\pgfsys@invoke\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\pgfsys@discardpath\pgfsys@invoke\lxSVG@closescope\pgfsys@endscope\hss\lxSVG@closescope\endpgfpicturestepξ1/21 and therefore:
[TABLE]
(Analogously if bR(−1)>(ξ1+ξ2)/2 then c>cok follows.)
Experimental Results.
For D=50 and all k∈{3,…7} we computed, using double-precision floating point values, a1,a2,… and b1,b2,… for various c. For each pair (k,c), we either find that P^(k,c) is consolidating, it is eroding, or none of the two can be verified. The results suggest that erk<ck∗<cok where ck∗ is the orientability threshold for k-ary Erdős-Renyi hypergraphs.
Concretely, we considered for j=1,2,3,… the values ck∗−2−j and tried to verify that they are less than erk. The largest for which we succeeded is report as bk in Table 1 on page 1.
The largest number of iterations required was 6⋅107. For the first value that could not be shown to be less than erk, our approximations of the sequence of (ai)i∈N became stationary with a[0]>ξ1/2, i.e. the double-precision floats did not change any more (the highest number of iterations to reach this point was 2⋅108). It is possible that the value is still less than erk and our choice of D or the precision of our floats is simply insufficient. Further experiments with 128-bit floats and larger values of D suggest however, that there is a tiny but real gap between erk,ck∗ and cok and the natural conjecture of equality is misplaced.
In the same way we report the smallest value of the form ck∗+2−j for which we verified that it exceeds cok as Bk in Table 1.
6 Peeling Necessitates Orientability of Erdős-Renyi Hypergraphs
We now prove the “fk≤ck∗”-half of Theorem 1.1, stated as Theorem 6.2. Recall that an orientation of a hypergraph H=(V,E) is an injective map f:E\textrightarrowV with f(e)∈e for all e∈E and ck∗ is the thresholds for orientability k-uniform Erdős-Renyi hypergraphs.
After classical (2-ary) cuckoo hashing was discovered [37] (relying on c2∗=21), the thresholds for k>2 were determined independently by [11, 20, 21], with generalisations to other graphs and hypergraphs studied in [9, 18, 26, 27, 41].
Note that if H is peelable then it is also orientable: Just orient each edge e to a vertex v∈e such that v and e are deleted in the same round of peel(H).
Our proof of Theorem 6.2 relies strongly on a deep and remarkable theorem due to Lelarge [28]. To clarify its role in our proof, we restate it in weaker but sufficient form.
Let (Gn=(An,Bn,En))n∈N be a sequence of bipartite graphs with ∣En∣=O(∣An∣). Let further M(Gn) be the size of a maximum matching in Gn. If the random weak limitρ of (Gn)n∈N is a bipartite unimodular Galton-Watson tree, then n\textrightarrow∞lim∣An∣M(Gn) exists almost surely and depends only on ρ.
To see the connection, note that an orientation of a hypergraph is a left-perfect matching in its (bipartite) incidence graph.
Theorem 6.2**.**
For all k≥3 we have fk≤ck∗.
Proof 6.3**.**
Let c=ck∗+ε. We need to show that there exists ℓ∈N such that the fuse graph F=F(n,k,c,ℓ) is not peelable whp.
Let H=Hn,cnk be the k-ary Erdős-Renyi random hypergraph with density c. By choice of c, H is not orientable whp. More strongly even, there exists δ=δ(ε)>0 such that the largest partial orientation, i.e. the largest subset of the edges that can be oriented, has size (1−δ)cn+o(n) whp, see for instance [28].
We set ℓ=δck and consider F as well as the hypergraph F where the vertices i and i+nℓ for all i∈{1,…,(k−1)n} are merged. This “glues” the last k−1 segments of F on top of the first k−1 segments of F, making F a “seamless” version of our construction. Crucially, the random weak limit of F and H coincide, i.e. for any constant R∈N the distribution of the R-neighbourhood NF(v,R) of a random vertex v of F has the same limit (as n\textrightarrow∞) as the distribution of the R-neighbourhood NH(v,R) of a random vertex v of H.555The common limit of the incidence graphs of F and H is the bipartite unimodular Galton-Watson tree described in [28, Section 4]. Standard arguments, e.g. from [25, 26] suffice to establish the identity.
It now follows from [28, Theorem 4.1] that the size of the largest partial orientation of F is essentially also a (1−δ)-fraction of the number of edges, namely (1−δ)cℓn+o(n) whp. Switching from F back to F can increase the size of a largest partial orientation by at most (k−1)n to (1−δ+cℓk−1)cℓn+o(n)=(1−kδ)cℓn+o(n) whp. Thus F is not orientable whp and therefore not peelable whp.
7 Experiments
We used our hypergraphs to implement retrieval data structures and compare it to existing implementations.
A 1-bit retrieval data structure for a universe U is a pair of algorithms construct and query, where the input of construct is a set S⊆U of size m=∣S∣ and f:S\textrightarrow{0,1}. If construct succeeds, then the output is a data structure Df such that query(Df,x)=f(x) for all x∈S. The output of query(Df,y) for y∈U∖S may yield an arbitrary element of {0,1}. The interesting setting is when the data structure may only occupy O(m) bits. See [8, 7, 12, 22, 39].
One approach is to map each element x∈S to a set ex⊂[N] via a hash function, where N=m/c for some desired edge density c. One then seeks a solution z:[N]\textrightarrow{0,1} satisfying ⨁v∈exz(v)=f(x) for all x∈S. The bit-vector z and the hash function then form Df. A query simply evaluates the left hand side of the equation for x to recover f(x). To compute z, we consider the hypergraph H=([N],{ex,x∈S}). A peelable vertex v∈[N] only contained in one edge ex corresponds to a variable z(v) only occuring in the equation associated with x. It is thus easy to see that if H is peelable, repeated elimination and back-substitution yields z in O(m) time.
We implemented the following peeling-based variations and report results in 666Experiments were performed on an desktop computer with an IntelR◯ Core i7-2600 Processor @ 3.40GHz. In all cases, the data set S contains the first m=107 URLs from the eu-2015-host dataset gathered by [5] with ≈80 bytes per key, and f:U\textrightarrow{0,1} is taken to be the parity of the string length. As hash function we used MurmurHash3_x64_128 [2]. If more than 128 hash bits were needed, techniques resembling double-hashing were used to generate additional bits to avoid another execution of murmur. Reported query times are averages obtained by querying all elements of the data set once. They include the roughly 25 ns needed to evaluate murmur on average. The reported numbers are medians of 5 executions.
in Table 2. By the overhead of an implementation we mean mN′−1 where N′≥N is the total number of bits used, including auxiliary data structures.
•[Botelho et al. [8]] H is a 3-ary Erdős-Renyi hypergraph with an edge density below the peelability threshold c3≈0.818. Construction via peeling and queries are very fast, but the overhead of 23% is sizeable (i.e. Df occupies roughly 1.23m bits).
•[Fuse Graphs.] The edges are distributed such that H is a fuse graph. Recall that the edge density is cℓ+k−1ℓ. Note that we let ℓ grow with k to keep the density close to c. We still keep ℓ in a moderate range, as our construction relies on n≫ℓ.
•[Luby et al. [29]] The edges are distributed such that H is the peelable hypergraph from [29] already mentioned on page 1. To our knowledge these hypergraphs have not been considered in the context of retrieval. They seem to be particularly well suited to achieve very small overheads at the cost of larger construction and mean query times compared to our other approaches. Note that the largest edge size is D+4 and the worst-case query time therefore much larger than the reported average query time.
For reference, we also implemented two recent retrieval data structures that do not rely on peeling but solve linear systems [14, 22]. There, to counteract cubic solving time, the input is partitioned into chunks of size C. Especially [14] achieves much smaller overheads than what is feasible with peeling approaches, with the downside of being much slower and more complicated.
Overall, it seems using fuse graphs in retrieval data structures has a chance of outperforming existing approaches when moderate memory overheads of ≈5% are acceptable.
However, more research is required to explore the complex space of possible input sizes, configurations of the data structures and trade-offs between overhead and runtime. Our implementations are configured reasonably, but arbitrary in some aspects. A full discussion is beyond the scope of this paper.
8 Conclusion
We introduced for all k∈N a new family of k-uniform hypergraphs where the vertex set is partitioned into a large but constant number of segments. Each edge chooses a random range of k consecutive segments and one random incidence in each of them.
While we have no asymptotic results on the resulting peelability thresholds fk, at least for small k they are remarkably close to ck∗ with 0≤ck∗−fk≤10−5 for k∈{3,4,5,6,7}. In other words, fk almost coincides with the orientability threshold ck∗ of Erdős-Renyi hypergraphs and significantly exceeds their peelability threshold ck. Note that c_{k}^{*}=1-(1+o_{k}(1))e^{-k}\ \smash{\scalebox{0.8}{\stackrel{{\scriptstyle k\textrightarrow∞}}{{\longrightarrow}}}}\ 1 (see [20, page 3]) while c_{k}\ \smash{\scalebox{0.8}{\stackrel{{\scriptstyle k\textrightarrow∞}}{{\longrightarrow}}}}\ 0 (see e.g. [36]). When plugging our hypergraphs into the retrieval framework by [8], we obtained corresponding improvements with respect to memory usage, with no discernible downsides.
Future Experiments.
While our experiments on retrieval data structures are promising, it is unclear how robustly the advantages translate to other practical settings where peelable hypergraphs are used, say when implementing Invertible Bloom Lookup Tables [23]. There are hidden disadvantages of our hypergraphs not considered in this paper – for instance the number of rounds needed to peel our hypergraphs is higher, possibly hurting parallel peeling algorithms – as well as hidden advantages – peeling in external memory, a setting considered in [3], is easy due to the locality of the edges.
A Theoretical Question.
Given our results, it is natural to suspect a fundamental connection between fk and ck∗. Quite possibly, the tiny gap that seems to remain between the values – clearly negligible from a practical perspective – is merely an artefact of the discreteness of segments in our construction.
This discreteness, while heavily used in our arguments, may in fact be dispensable. Indeed, we believe the key idea behind our hypergraphs is limited bandwidth where a hypergraph on vertex set [n] has bandwidth at most d if each edge e satisfies maxv∈ev−minv∈ev<d (the incidence matrix can then be sorted to resemble a bandmatrix). Such a hypergraph can be generated by choosing for each edge a random range of d consecutive vertices and k incidences independently and uniformly at random from that range. In experiments with k=3 and d=εn, such hypergraphs performed similar to the hypergraphs we analysed (with k=3 and ℓ≈1/ε). Note that there are no discrete segments in the modified construction. It would be nice to see whether in such a variation peelability and orientability are more elegantly and more intimitely linked.
Bibliography41
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] David Aldous and J. Michael Steele. The Objective Method: Probabilistic Combinatorial Optimization and Local Weak Convergence , pages 1–72. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004. doi:10.1007/978-3-662-09444-0_1 . · doi ↗
3[3] Djamal Belazzougui, Paolo Boldi, Giuseppe Ottaviano, Rossano Venturini, and Sebastiano Vigna. Cache-oblivious peeling of random hypergraphs. In Data Compression Conference , pages 352–361, 2014. doi:10.1109/DCC.2014.48 . · doi ↗
4[4] Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM , 1970. URL: http://doi.acm.org/10.1145/362686.362692 .
5[5] Paolo Boldi, Andrea Marino, Massimo Santini, and Sebastiano Vigna. B Ubi NG: Massive crawling for the masses. In Proc. 23rd WWW’14 , pages 227–228, 2014. doi:10.1145/2567948.2577304 . · doi ↗
6[6] Fabiano Cupertino Botelho. Near-Optimal Space Perfect Hashing Algorithms . Ph D thesis, Federal University of Minas Gerais, 2008. URL: http://homepages.dcc.ufmg.br/~fbotelho/en/pub/thesis.pdf .
7[7] Fabiano Cupertino Botelho, Rasmus Pagh, and Nivio Ziviani. Simple and space-efficient minimal perfect hash functions. In Proc. 10th WADS , pages 139–150, 2007. doi:10.1007/978-3-540-73951-7_13 . · doi ↗
8[8] Fabiano Cupertino Botelho, Rasmus Pagh, and Nivio Ziviani. Practical perfect hashing in nearly optimal space. Inf. Syst. , pages 108–131, 2013. doi:10.1016/j.is.2012.06.002 . · doi ↗