Sparse Hypergraphs with Applications to Coding Theory
Chong Shangguan, Itzhak Tamo

TL;DR
This paper improves bounds on the maximum size of certain sparse hypergraphs with applications to coding theory, using novel combinatorial methods to handle complex divisibility cases.
Contribution
It introduces a new lower bound for hypergraph extremal functions, especially when divisibility conditions are not met, and constructs hypergraphs with multiple free properties for coding applications.
Findings
Established a logarithmic factor improvement in lower bounds.
Constructed hypergraphs with multiple free properties.
Applied hypergraph independence bounds to coding theory problems.
Abstract
For fixed integers , an -uniform hypergraph is called -free if the union of any distinct edges contains at least vertices. Brown, Erd\H{o}s and S\'{o}s showed that the maximum number of edges of such a hypergraph on vertices, denoted as , satisfies For , the lower bound matches the upper bound up to a constant factor; whereas for , in general it is a notoriously hard problem to determine the correct exponent of . Among other results, we improve the above lower bound by showing that for any satisfying . The hypergraph we constructed is in factβ¦
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLimits and Structures in Graph Theory Β· graph theory and CDMA systems Β· Graph Labeling and Dimension Problems
Sparse hypergraphs with applications to coding theory111Part of this paper has been published in 2019 IEEE International Symposium on Information Theory.
Chong Shangguan222Department of Electrical Engineering-Systems, Tel Aviv University, Tel Aviv 6997801, Israel. Email: [email protected]. and Itzhak Tamo333Department of Electrical Engineering-Systems, Tel Aviv University, Tel Aviv 6997801, Israel. Email: [email protected].
Abstract
For fixed integers , an -uniform hypergraph is called -free if the union of any distinct edges contains at least vertices. Brown, ErdΕs and SΓ³s showed that the maximum number of edges of such a hypergraph on vertices, denoted as , satisfies
[TABLE]
For sufficiently large and , the lower bound matches the upper bound up to a constant factor, which depends only on ; whereas for , in general it is a notoriously hard problem to determine the correct exponent of . Among other results, we improve the above lower bound by showing that
[TABLE]
for any satisfying . The hypergraph we constructed is in fact -free for every , and it has several interesting applications in coding theory. The proof of the new lower bound is based on a novel application of the lower bound on the hypergraph independence number due to Duke, Lefmann, and RΓΆdl.
Keywords: sparse hypergraphs, hypergraph independence number, coding theory
Mathematics subject classifications: 05C35, 05C65, 05D40, 94B25, 68R05, 68R10
1 Introduction
Since the pioneering work of TurΓ‘n [38], the study of TurΓ‘n-type problems has been playing a central role in the field of extremal combinatorics. In this work, we present an improved probabilistic lower bound for a hypergraph TurΓ‘n-type problem introduced by Brown, ErdΕs and SΓ³s [10] in 1973. We also show that this new bound provides improved constructions for several seemingly unrelated problems in coding theory, including Parent-Identifying Set Systems, uniform Combinatorial Batch Codes and optimal Locally Recoverable Codes.
Let us begin with some necessary notation. For an integer , an -uniform hypergraph (henceforth an -graph) can be viewed as a pair of vertices and edges, where the vertex set is a finite set and the edge set is a collection of -subsets of . An -graph is called -free if it contains no subhypergraph which forms a copy of . For a family of -graphs, the TurΓ‘n number, , is the maximum number of edges in an -graph on vertices which is -free for every .
Throughout this paper, an -graph always stands for its edge set . The vertex set is viewed as a subset of . Given a finite set , denote by the family of distinct -subsets of . Hence, . We will frequently use the standard Bachmann-Landau notations and , whenever the constants are not important.
For integers , let be the family of all -graphs formed by edges and at most vertices; that is,
[TABLE]
An -graph is called -free if it does not contain a copy of any member of , namely, the union of any distinct edges of contains at least vertices. In the literature, such -graphs are also termed sparse [19]. As in the previous papers (see, e.g. [3]), we use the notation .
Since the study of for or has been quite extensive (see, e.g. [15, 16, 30]), we focus on the asymptotic behavior of for fixed integers as . It was shown in [10] that in general
[TABLE]
The lower bound in (1) is obtained by a standard probabilistic method (now known as the alteration method, see, e.g. [4]), and the (naivest) upper bound follows from a double counting argument, which uses the simple fact that any set of vertices can be contained in at most distinct edges.
Observe that the exponent of in (1) is tight for ; however, for , in general it is a notoriously hard problem to determine the correct order of the exponent of . In particular, for fixed and , the study of as has attracted considerable attention since the work of [10, 9]. It is easy to check by (1) that
[TABLE]
The following conjecture remains widely open.
Conjecture 1** (see, [10, 3]).**
For fixed integers ,
[TABLE]
as .
Conjecture 1 has been studied in depth for more than forty years. For example, the first case of the conjecture, i.e., when and , was already highly nontrivial. It was not solved until Ruzsa and SzemerΓ©di [31] proved the (6,3)-theorem
[TABLE]
where the upper bound follows from the celebrated Regularity Lemma [36], and the lower bound is based on Behrendβs construction [6] on 3-term arithmetic progression free sets. The study of indicates that the resolution of Conjecture 1 may rely heavily on the regularity lemmas444which include, for example, the graph regularity lemma and the hypergraph regularity lemma, see, [13] and Behrend-type constructions, which are among the most powerful tools in extremal combinatorics. Improvements of (1) on sporadic or less general parameters have been obtained in a line of other works [17, 3, 32, 33, 27, 20]. Currently, the upper bound part of Conjecture 1 is known to be true for all [20], and the lower bound part holds for [3] and [20].
Despite the efforts of many researchers, the lower bound (2) implied by (1) remains the best possible for and . In the proposition below we slightly improve the lower bound of (2) by a factor.
Proposition 2**.**
For fixed integers ,
[TABLE]
as .
Proposition 2 is in fact an easy consequence of the following more general result.
Theorem 3**.**
For fixed integers satisfying and sufficiently large , there exists an -graph with
[TABLE]
edges, which is simultaneously -free for every . In particular, setting we have
[TABLE]
The proof of this theorem is presented in Section 2. To see that Proposition 2 indeed follows from Theorem 3, it suffices to write for some (we exclude since in that case the exponent of (1) is tight), then holds, for example, when or is a prime. Proposition 2 follows by setting .
The proof of Theorem 3 relies on a novel application of the lower bound on the hypergraph independence number due to Duke, Lefmann, and RΓΆdl [14], as stated in Section 2. Since in our proof we cannot get rid of the coprime condition, it remains an interesting question to determine whether this constraint is necessary. Moreover, we have the following open problem.
Problem 4**.**
For which parameters satisfying and , there exist a constant such that for sufficiently large ,
[TABLE]
It is noteworthy that sparse hypergraphs have found many applications in theoretical computer science and coding theory, some of which are listed as follows:
- β’
-free 3-graphs were used in PCP analysis and Linearity Testing [25], Communication Complexity [29], Monotonicity Testing [18] and Coded Caching Schemes [35];
- β’
-free -graphs can be used to construct Perfect Hash Families [34] and Parent-Identifying Set Systems (IPPSs for short);
- β’
-graphs which are simultaneously -free for each were used to construct uniform Combinatorial Batch Codes [5] (uniform CBCs for short); and in particular, for they were used in a bitprobe model with three probes [2];
- β’
-graphs which are simultaneously -free for each were used to construct optimal Locally Recoverable Codes [40] (optimal LRCs for short).
In Section 4 we will present the applications of Theorem 3 in the constructions of IPPSs, uniform CBCs and optimal LRCs.
The rest of this paper is organized as follows. In Section 2 we present the proof of our main result, namely Theorem 3. In Section 3 we discuss the applications of Theorem 3 to two problems in extremal combinatorics, and in Section 4 we present three applications of Theorem 3 to coding theory.
2 Proof of the main result
To prove Theorem 3 we will make use of the following lemma of Duke, Lefmann, and RΓΆdl [14] (whose proof applied a result of [1]). Note that an independent set of an -graph is a subset of vertices such that no elements form an edge, and an -graph is said to be linear if any two distinct edges share at most one vertex.
Lemma 5** (see Theorem 2, [14]).**
For all fixed there exists a constant depending only on such that every linear -graph on vertices with average degree555The original theorem in [14] has the condition βwith maximum degree at most β. However, since for any hypergraph with average degree at most , there exists a subhypergraph of it which has at least half of its vertices and maximum degree at most , it is not hard to observe that the assertion of the original theorem works also with the condition βwith average degree at most β, at the expense of a worse constant . at most has an independent set of size at least
Recall that we view the parameters as constants, whereas tends to infinity. Since we are only interested in the asymptotic behavior we do not make an attempt to optimize any of the constants. The following two inequalities are well known (see, e.g. [4]).
Chernoffβs inequality. Suppose are independent random variables taking values in . Let denote their sum and let denote the sumβs expected value. Then for any , .
Markovβs inequality. If is a nonnegative random variable and , then .
Below we present the proof of Theorem 3.
Proof of Theorem 3.
Set for some , which will be made explicit later. Generate an -graph by picking each member of independently with probability . Let denote the number of edges in . Clearly,
[TABLE]
For , let be the collection of all distinct edges of whose union contains at most vertices, where will be determined later. Let denote the size of . Then
[TABLE]
where the first equality follows from the fact that there are at most ways to choose edges whose union contains at most vertices.
We say that distinct edges of form a bad -system if their union contains at most vertices. Clearly, two distinct bad -systems can share at most edges. For each , let be the collection of the unordered pairs of bad -systems which share precisely edges, and the union of those common edges contains at least vertices. For , it is clear that and Let denote the size of . Then
[TABLE]
where the first equality follows from the fact that there are at most ways to choose edges whose union contains at most vertices. Lastly, let denote the number of bad -systems in . Then
[TABLE]
In order to apply Lemma 5, we will bound from above the number of pairs of bad -systems which share at least two edges, by picking and so that
[TABLE]
for each as . From (3), (4) and (5), it is easy to see that if and only if
[TABLE]
and if and only if
[TABLE]
Let a=\min_{2\leq i\leq e-1}\left\{\frac{1}{i-1}\big{(}f(i)-\frac{(i-1)(er-v)}{e-1}\big{)},~{}\frac{1}{2e-i-1}\big{(}\frac{(i-1)(er-v)}{e-1}+1-f(i)\big{)}\right\}. There is satisfying (7) and (8) if and only if for each , there exists an integer such that
[TABLE]
Since is an integer, (9) holds if and only if It is easy to verify that those indivisibility conditions hold simultaneously if and only if Under this condition, it suffices to pick for each ,
[TABLE]
and an arbitrary (note that by the choices of we have ).
Applying Chernoffβs inequality for and Markovβs inequality for and , it is easy to see that for each and sufficiently large ,
[TABLE]
Therefore, with positive probability, there exists an -graph such that for each ,
[TABLE]
Fix such an . We construct a subhypergraph of as follows. For every , remove from one edge from each member of , and one edge from for each pair . It is not hard to check that satisfies the following properties:
- (i)
;
- (ii)
the number of bad -systems contained in is at most ;
- (iii)
for each , the union of any distinct edges in contains more than vertices;
- (iv)
any two bad -systems in can share at most one edge.
Indeed, (i) is an easy consequence of the following calculation:
[TABLE]
(ii) follows from (3), (6) and the observation that removing edges from does not increase the number of bad -systems; (iii) holds since according to our construction, does not contain any member of for any . It remains to verify (iv). Assume to the contrary that still contains two bad -systems that share edges for some . On one hand, if those edges are spanned by at least vertices, then the pair of such two bad -systems must belong to , which is a contradiction. On the other hand, if those edges are spanned by at most vertices, then they must form a member of , which is again a contradiction.
Next we construct an auxiliary -graph as follows:
- β’
the vertex set of is formed by the edge set of ;
- β’
vertices of form an edge if and only if the corresponding -edges in form a bad -system.
It is routine to check that the following hold:
- β’
is linear (by (iv));
- β’
has at least vertices (by (i)) and at most edges (by (ii));
- β’
, the average degree of , is at most
Lemma 5 therefore applies and has an independent set of size at least
[TABLE]
Now the theorem follows from the following simple observation: every independent set corresponds to a -free subhypergraph with edges; moreover, by (iii) is also -free for each . β
The proof of Theorem 3 leads to the following proposition.
Proposition 6**.**
Let and be fixed integers satisfying . Suppose further that and for . Then there exists an -graph with edges which is -free for each .
Sketch of the proof.
With the notation of the previous proof, we generate an -graph by picking each element of independently with probability for some small constant . For , let and be defined analogously to the proof of Theorem 3 but with respect to and . Hence, the expected number of edges in is . Moreover, for , the expected number of bad -systems contained in is .
By assumption, it is clear that for each . Let . Then for any and ,
[TABLE]
Choosing an arbitrary , it is easy to check that for any and ,
[TABLE]
Similar to the previous proof, by applying Chernoffβs inequality for and Markovβs inequality for and , one can show that with positive probability there exists an -graph such that
[TABLE]
The rest of the proof follows fairly straightforwardly from the argument of the previous proof, hence is omitted. β
3 Applications to two extremal problems
The probabilistic construction of Theorem 3 immediately implies new lower bounds for two hypergraph extremal problems, as stated below.
3.1 -free -graphs
BujtΓ‘s and Tuza [11] studied the following extremal problem which is related to the construction of uniform Combinatorial Batch Codes (see Subsection 4.2 below). An -graph is said to be -free if it is simultaneously -free for every . In [11] it was shown that for fixed integers ,
[TABLE]
The following proposition is a direct consequence of Theorem 3.
Proposition 7**.**
For fixed integers with ,
[TABLE]
as .
Proof.
Apply Theorem 3 with . Since for every ,
[TABLE]
so there exists an -graph with edges, which is -free for every , as needed. β
3.2 -graphs with no short Berge cycles
For integers , a Berge -cycle in an -graph is a set of distinct vertices associated with distinct666In the literature, some authors (see, e.g. [26]) require the edges in a Berge cycle to be distinct, while others (see, e.g. [39]) do not. However, it is easy to show that if there are at least two distinct edges in the cycle, then a Berge cycle without distinctness contains a Berge cycle with distinctness. Since in this paper we only consider the length of a shortest Berge cycle, the definition with distinctness is more suitable for us. edges such that for and . An -graph is said to be -free if it contains no Berge cycles of length at most . For , the results of [31, 17] implied that for any , For , it was shown in [26] that for , , and in [37] that for any , Recently, Xing and Yuan [40] used -free -graphs to construct optimal Locally Recoverable Codes (see Subsection 4.3 below) and they showed that (using the alteration method) for any and ,
[TABLE]
It is not hard to verify (see, e.g. Theorem 5.1 in [40]) that an -graph is -free if and only if it is simultaneously -free for every . Thus applying Theorem 3 with and leads to the following result.
Proposition 8**.**
For fixed integers ,
[TABLE]
as ; or equivalently, there exists an -graph with such number of edges, which is simultaneously -free for every .
We remark that in [40] the authors stated that in a private communication, Jacques VerstraΓ«te suggested that a lower bound on , which is exactly the same with Proposition 8, can also be proved by using the method of [8, 7] (which is rather involved). Nevertheless, since [40] stated this result (as well as Proposition 13 below) without a proof, we present it here as an easy consequence of Theorem 3.
4 Applications to coding theory
In this section we present three applications of Theorem 3 to coding theory.
4.1 Parent-Identifying Set Systems
An -graph is said to be a -Parent-Identifying Set System (-IPPS for short), denoted as -, if for any -subset which is contained in the union of at most edges of , it holds that
[TABLE]
where .
IPPSs were introduced by Collins [12] as a technique to trace traitors in a secret sharing scheme. Generally speaking, an -threshold secret sharing scheme has one message and keys such that any set of at least keys can be used to decrypt this message but no set of fewer than keys can. Let be a - whose vertices and edges are indexed by the keys and the users, respectively. Assume that there is a data supplier distributes the keys to the users such that for , the th user gets the keys which form the th edge of . Suppose a coalition of at most illegal users may collude by combining some of their keys to produce a new, unauthorized set of keys to decrypt this message. Then, by definition of a -IPPS, upon capturing an unauthorized set , the data supplier is able to identify at least one illegal user who contributed to .
For a - with given and , it was shown by Gu and Miao [23] that
[TABLE]
Recently, Gu, Cheng, Kabatiansky and Miao [22] showed that for fixed integers , there exists a - with
[TABLE]
which implies that for the upper bound in [23] is tight up to a constant factor. We slightly improve the lower bound of [22] for some pairs of .
Proposition 9**.**
For fixed integers satisfying , there exists a - with
[TABLE]
as .
Proposition 9 is proved by establishing a connection between IPPSs and sparse hypergraphs, as stated below. Note that a similar observation with different phrasing was obtained independently in [22].
Lemma 10**.**
Assume that is a -free -graph with . Then it is also a -.
Proof.
Assume towards contradiction that is not a -. Thus by definition there exists an -subset , which can be covered by at most edges of , such that . Let be the minimal positive integer such that there exist with . By the minimality of , it holds that for each ,
[TABLE]
Without loss of generality, assume . Clearly, and moreover, for , . Let , then
[TABLE]
where the second inequality follows since and for any .
Let . We claim that for each there exist at least two distinct sets that contain it. Assume the opposite, then there exist and , such that belongs solely to but to no other set in ; that is, and . This implies that for any , and , a contradiction.
Add to arbitrary edges of . It is clear now that contains exactly edges of and
[TABLE]
where the inequality follows since each element of appears in at least two edges of . This violates the -freeness of for , and the result follows. β
Proof of Proposition 9.
Apply Lemma 10 and Theorem 3 with and . β
4.2 Uniform Combinatorial Batch Codes
An -uniform CBC with parameters , denoted as -CBC, is an -uniform multihypergraph (i.e., hypergraphs allowing repeated edges) with vertices and edges, such that for every , the union of any distinct edges contains at least vertices. For integers , let denote the maximum such that an -CBC exists.
Uniform-CBCs can be applied to the following scenario in a distributed database system, as illustrated by Balachandran and Bhattacharya [5]. Assume that there are data items which are stored in servers and any data item is replicated across servers so that any of the data items can be retrieved by accessing servers and reading exactly one data item from each. Let be an -uniform multihypergraph whose vertices and edges are indexed by the servers and the data items, respectively. An -based replication system stores data items among servers as follows: for , the th data item is stored in the servers which form the th edge of 777Since two distinct data items may be stored in the same set of servers, is allowed to have repeated edges..
Given an -based replication system, the required retrieval condition on the servers and the data items can be expressed in a purely combinatorial way: every collection of at most distinct edges of has a system of distinct representatives (SDR for short) from the vertices, where for any edges , an SDR of is a set of distinct elements such that for each . Applying Hallβs theorem [24] one can infer that this holds if and only if is -free for every .
Recall that an -graph is said to be -free if it is simultaneously -free for every . Clearly, an -CBC is equivalent to an -free -uniform multihypergraph with vertices and edges; consequently,
[TABLE]
For fixed integers , it was shown in [28] that
[TABLE]
An easy application of Proposition 7 suggests the following result.
Proposition 11**.**
For fixed integers satisfying ,
[TABLE]
as .
4.3 Optimal Locally Recoverable Codes
A linear code of length defined on the finite field is a subspace of . The minimum distance of is defined as , where is the number of nonzero coordinates of . A parity check matrix of is an matrix such that if only if
A linear code of dimension is called Locally Recoverable Code (or LRC for short) with locality if for any there exists other coordinates such that for any codeword , can be recovered from . We denote such a code by -LRC. In [21] it was shown that the minimum distance of an -LRC satisfies and the code is called optimal if the bound is achieved with equality.
In order to reduce the complexity of the operations in the finite field, it is desirable to define the LRCs over small enough fields. In other words, given the size of the underlying field, our goal is to construct the longest possible optimal-LRC.
Assume that . Set and let and 1 be the identity matrix of order , and the all row vector of length , respectively. It is not hard to verify that a linear code with parity check matrix of the form
[TABLE]
where is the Kronecker product and is an matrix, has locality . Indeed, any symbol of a codeoword can be recovered by other symbols since it satisfies a linear equation which has exactly variables.
Xing and Yuan [40] gave a construction of an optimal LRC for by carefully constructing the matrix in (11), as follows. For a subset , let be the Vandermonde matrix with as its -entry. The following result was proved in [40].
Lemma 12** (see Theorem 3.1, [40]).**
Let and , and let be a linear code with parity check matrix
[TABLE]
then is an optimal -LRC with minimum distance if and only if the family is -free for each .
The following result (which is stated in [40] without a proof) follows by combining Lemma 12 and Proposition 8.
Proposition 13** (see also Theorem 1.1, [40]).**
Suppose that , and , then there exists an optimal -LRC over with minimum distance and length n=\Omega\big{(}q(q\log q)^{\frac{1}{\lfloor(d-3)/2\rfloor}}\big{)}.
Acknowledgements
The research of Chong Shangguan and Itzhak Tamo was supported by ISF grant No. 1030/15 and NSF-BSF grant No. 2015814. The authors would like to thank Prof. Yiwei Zhang for valuable comments on the first version of this manuscript. They are also grateful to Yujie Gu for helpful discussions on Parent-Identifying Set Systems. Lastly, the authors want to express their gratitude to the two anonymous reviewers for their comments which are very helpful to the improvement of this paper.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Ajtai, J. KomlΓ³s, J. Pintz, J. Spencer, and E. SzemerΓ©di. Extremal uncrowded hypergraphs. J. Combin. Theory Ser. A , 32(3):321β335, 1982.
- 2[2] N. Alon and U. Feige. On the power of two, three and four probes. In Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms , pages 346β354. SIAM, Philadelphia, PA, 2009.
- 3[3] N. Alon and A. Shapira. On an extremal hypergraph problem of Brown, ErdΕs and SΓ³s. Combinatorica , 26(6):627β645, 2006.
- 4[4] N. Alon and J. H. Spencer. The probabilistic method . John Wiley & Sons, 2016.
- 5[5] N. Balachandran and S. Bhattacharya. On an extremal hypergraph problem related to combinatorial batch codes. Discrete Appl. Math. , 162:373β380, 2014.
- 6[6] F. A. Behrend. On sets of integers which contain no three terms in arithmetical progression. Proc. Nat. Acad. Sci. U. S. A. , 32:331β332, 1946.
- 7[7] P. Bennett and T. Bohman. A note on the random greedy independent set algorithm. Random Structures Algorithms , 49(3):479β502, 2016.
- 8[8] T. Bohman and P. Keevash. The early evolution of the H π» H -free process. Invent. Math. , 181(2):291β336, 2010.
