
TL;DR
This paper introduces a generalized twin problem in combinatorial structures, improving bounds on twin sizes in graphs, strings, and permutations, and disproving a previous conjecture.
Contribution
It proposes a new variant of the twin problem that unifies different cases and provides improved bounds and counterexamples to existing conjectures.
Findings
Improved bounds on twin sizes in graphs, strings, and permutations.
Disproved a conjecture by Dudek, Grytczuk, and Ruciński.
Unified framework for twin problems in ordered combinatorial objects.
Abstract
Given a combinatorial structure, a ``twin'' is a pair of disjoint substructures which are isomorphic (or look the same in some sense). In recent years, there have been many problems about finding large twins in various combinatorial structures. For example, given a graph , one can ask what is the largest such that there exist disjoint subsets on vertices, such that the induced subgraphs are isomorphic. We are motivated by two different problems of finding twins in two kinds of ordered objects (strings and permutations). We introduce a new variant of ``twin problem'' which generalizes both of these. By considering this generalization, we are able to improve some bounds obtained by Dudek, Grytczuk, and Ruci\'nski, and give a negative answer to a conjecture of theirs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLimits and Structures in Graph Theory · Advanced Graph Theory Research · semigroups and automata theory
A notion of twins
Zach Hunter
Abstract.
Given a combinatorial structure, a “twin” is a pair of disjoint substructures which are isomorphic (or look the same in some sense). In recent years, there have been many problems about finding large twins in various combinatorial structures. For example, given a graph , one can ask what is the largest such that there exist disjoint subsets on vertices, such that the induced subgraphs are isomorphic.
We are motivated by two different problems of finding twins in two kinds of ordered objects (strings and permutations). We introduce a new variant of “twin problem” which generalizes both of these. By considering this generalization, we are able to improve some bounds obtained by Dudek, Grytczuk, and Ruciński, and give a negative answer to a conjecture of theirs.
1. Introduction
For positive integer , we write . Given a set and positive integer , we write .
1.1. Recent work in ordered settings
We start by recalling two problems about finding “twins” in various ordered objects. A generalization of these shall be the focus of this note.
Firstly, given a binary string , a string-twin is a pair of disjoint indices , such that the subsequences and are equal (formally, that for each ). We define to be the maximum such that there exists a string-twin of with . Since are disjoint subsets of of equal length, it is obvious that always holds.
In the work of Axenovich, Person, and Puzynina, it was shown that this upper bound was asymptotically tight (i.e., that every binary string had a twin of length ) [1]. This was acheived by establishing a celebrated “regularity lemma for strings”.
One can also ask analogous questions about the length of twins in -ary strings (here, we say are string-twins of if they are disjoint and ). Write to denote the minimum of over all . A trivial consequence of [1] tells us that (this is seen by passing to the subsequence induced on the two most popular letters of any -ary string, and then applying their bound for the binary case). This later was further improved by Bukh and Zhou, who prove that [4].
Recently, Dudek, Grytczuk, and Ruciński introduced a similar problem involving permutations. Given a permutation , we define its sign sequence so that if and only if . We then define a weak-twin (of ) to be a disjoint pair of indices , such that (formally, that if and only if for ).
Let denote the largest such that there exists a weak-twin of with . And write to denote the minimum of over all .
Dudek et al. proved that every has a weak-twin of length [5] (i.e., that ). They furthermore conjectured that, like in the binary string case, every should have a weak-twin of length . We shall improve upon both their upper and lower bounds for .
1.2. A general twin problem
We write to denote the complete graph on vertex set (and here, the labels of vertices will be important).
Given positive integers , we write to denote the set of -edge-colorings of (i.e., the set of ). We shall also write to denote the set of all finite colorings.
Now given , a twin of size (with respect to ) is a pair of subsets , such that:
- •
are disjoint;
- •
for .
Given a colored ordered object , we define to be maximum where there exists a twin (wrt ) of size . For , we define to be the minimum of over all .
While the notions of string-twins and weak-twins are somewhat similar in appearance, there doesn’t seem to be a direct way to encode instances from one setting into the other (e.g., given a binary string , we don’t know how to create some where the string-twins of and weak-twins of are at all related). However, both settings can be encoded by our notion of twins. Specifically, we will establish the two following reductions in Section 2.
Proposition 1.1**.**
We have that
[TABLE]
Proposition 1.2**.**
We have that
[TABLE]
For lower bounds, we prove the following in Section 3.
Theorem 1**.**
We have
[TABLE]
By Proposition 1.1, Theorem 1 tells us , improving Dudek et al.’s previous bound of . But with two colors we can do even better, in Section 4 we establish the following.
Theorem 2**.**
We have
[TABLE]
Consequently, .
We also establish some upper bounds.
Theorem 3**.**
We have
[TABLE]
As previously noted, Bukh and Zhou proved that which is greater than for large (assuming is big enough).
Furthermore, we can adapt our techniques to establish the following.
Theorem 4**.**
There exists some absolute constant , so that for all large we have
[TABLE]
This contradicts a conjecture of Dudek et al.
1.3. Organization
In Section 2, we briefly deduce Propositions 1.1 and 1.2, which demonstrate that our twin problem appropriately “encodes” the notions of string-twins and weak-twins. In Section 3, we prove our general lower bound for (Theorem 1). In Section 4, we get an improved lower bound for (Theorem 2). In Section 5 we get our two upper bounds (Theorem 3 and Theorem 4).
Acknowledgements**.**
We thank Matthew Kwan for introducing us to the weak-twin problem posed in [5]. We also thank Daniel Carter and Benny Sudakov for helpful feedback on the writing of this manuscript.
Part of this work was conducted while the author was staying at IST Austria, we are grateful for their hospitality.
2. Reductions
In this section, we quickly establish Propositions 1.1 and 1.2.
Proposition 1.1 is a corollary of our first lemma.
Lemma 2.1**.**
Given , and there exists so that is a twin of if and only if is a weak-twin of .
Proof.
We define , which we consider an element of (since nothing really changes upon relabelling our color palette). For , we set if and only if . For , we have that .
Thus, are twins (wrt ) if and only if are disjoint and (which is equivalent to being weak-twins (of )). ∎
Proposition 1.2 is a corollary of our second lemma.
Lemma 2.2**.**
Given , there exists so that is a twin of if and only if is a string-twin of (and ).
Proof.
We define as follows. For , we take .
By construction, it is rather clear that our claim about the twins of holds.∎
3. Lower bound
3.1. Brief outline
We are loosely motivated by the following simple proof that . Fix some binary string , and set .
For , we have the discrete interval . By pigeonhole, we can find distinct such that .
Then, we obtain a string-twin of length by taking and . Indeed, it is clear that are disjoint, as are always distinct and the intervals are disjoint. Meanwhile, since it clear that and for (meaning the order of the indices is correct). Finally, we have for all , whence as desired.
Unfortunately, in our more generalized setting, such a strategy cannot work. Indeed, if we pick some to be the first indices of and , then it might be impossible to find such that (e.g., the coloring could make always red and always blue). So, instead of building one twin iteratively, we shall build multiple twins.
3.2. Proof of Theorem 1
Given , we say a pair of tuples form a -matching if .
We say a pair of -sets are -matchable if we can order these -sets to get a pair of tuples which form a -matching.
The relevance of these definitions is the following.
Lemma 3.1**.**
Consider some coloring . Let be a twin (wrt ) of length .
Set . If there exists where and are -matchable, then there is a twin of length with .
Proof.
By definition of -matchability, we may write so that .
We simply take . It is routine to check that the desired properties are satisfied. For completeness, this is done below.
Since , we have that is disjoint from , whence . Furthermore, since is a -set, , so it follows that are disjoint (since by definition are disjoint).
Then, write . By assumption, we have that
[TABLE]
Meanwhile, for , the fact that is a twin guaruntees that as desired.
Hence, are twins. Also, as , it is clear that implying the last property.∎
We now establish a matchability result, which will allow us to carry out a modified version of the argument sketched in Subsection 3.1.
Lemma 3.2**.**
Let . Take to be a copy of , with . Consider any -coloring .
Then there exists with such that for , there exists such that are -matchable.
Remark 3.3**.**
This sharp in two aspects. First, if , then fixing some injection , then taking will lack any which are matchable.
Secondly, if , then we can partition into parts each of size at most . Here if we take for all , we see that if is matchable, then for some .
Proof.
For and , we say is -popular if there are distinct with . By pigeonhole, we may define a map such that is -popular for each (this is because ).
Applying pigeonhole again, there must exist some such that has (implying as the cardinality must be an integer).
Finally, we note that each is -matchable. Indeed, since they are both -popular, there exist and distinct such that
[TABLE]
As are distinct, we may WLOG assume , whence is a -matching. ∎
Proof of Theorem 1.
Fix any , and set . For , we have the discrete interval .
We will now proceed by induction to find an -set such that for , there exists a twin of length with .
For , we may take , as any pair of distinct singletons is a twin.
Then for , we can invoke Lemma 3.2 with to find satisfying the conditions of the lemma. We shall simply take . To check satisfies our inductive assumptions, it suffices to consider and confirm there is some twin of length with .
By definition of , there exists such that are -matchable. By our assumptions on , there must exists a twin of length with . Since are -matchable and , we can apply Lemma 3.1 to deduce that there is in fact a twin of length with .
So, we see the induction goes through for all . Consequently, we see that as desired.∎
4. Doing better with two colors
Here we provide a slightly ad hoc argument that improves our bound for . The idea is that the conclusion of Lemma 3.2 (with ) should still hold when we delete an edge from . Meanwhile in the proof of Theorem 1, we don’t need to take the sets to be completely disjoint. So, by being a bit more careful, we can take to intersect one index in and still have things work, and now at each step we only expose 4 rather than 5 new indices.
Proof of Theorem 2.
Fix .
For ease of notation, let be the graph on vertex set , with being an edge if there exists a twin of length such that . We shall use the following corollary of Lemma 3.1.
Proposition 4.1**.**
Let be such that:
- •
;
- •
;
- •
* are -matchable.*
Then .
For , we shall find a triple such that for each (i.e., induces a triangle in ).
For , we can simply take , as is a clique.
Now suppose that we have some which induces a triangle in . We will find with , such that induces a triangle in . By induction, it shall follow that , as desired.
Let . Consider the discrete interval . We partition into three sets,
[TABLE]
[TABLE]
[TABLE]
By Proposition 4.1, it is clear that are each cliques.
We will now finish by considering a few cases.
Case 1 (): Here , thus we get by pigeonhole. WLOG, assume . Since is a clique, taking any will induce a triangle.
Case 2 (): Here , thus by pigeonhole. WLOG, assume , and fix distinct . Also, fix any (which is possible as ). We shall take .
Indeed, as is a clique. Meanwhile, since , there exists such that . Thus taking , we have that and are -matchings, implying by Proposition 4.1 as desired. Thus, we see induces a triangle.
Case 3 (): It follows that we can pick distinct . Next, pick any . We take .
We shall prove that for any distinct with , that . This will clearly imply that induces a triangle.
Now, since , there exists such that . Whence, is a -matching. Furthermore, since , we have that , allowing us to invoke Proposition 4.1 and deduce that as desired.∎
5. Some upper bounds
In this section, we improve the upper bounds for for various . Our proofs are reminisicant of the methods in a paper of Bukh and Guruswami [2] (see also their improved result with Håstad [3]), where they construct large sets of strings no two of which have a “long common subsequence”. Essentially, the idea will be to take a random string , where is some large number of colors so that we expect to only have very short twins. Then, we will encode as some edge-coloring , where we use fewer colors but still do not create particularly long twins.
We require the following bounds, which all follow from first moment considerations.
Proposition 5.1**.**
[4*, Theorem 4]*Let be a uniformly random -ary string of length . Asymptotically almost surely (as , lacks any string-twins of length (for some absolute constant independent of ).
We note that Proposition 5.1 is also a corollary of [1, Theorem 3].
Definition**.**
Given permutations (which we treat as bijections from ), we define to be the largest such that there exists (not necessarily disjoint) with for .
Proposition 5.2**.**
Let be chosen uniformly at random. We have .
Proof.
We shall write .
For , let be the indicator function of the event that for each . We then let
[TABLE]
It is clear that .
Now for any choice of , we have
[TABLE]
(here is the probability , and is the probability that our event holds conditioned on this). Meanwhile, there are only choices of sets . Whence, we see that
[TABLE]
(applying Stirling’s approximation). By our choice of , the LHS is at most for large , meaning decays super-polynomially as desired. ∎
Now, we start with the easier of our two upper bounds, which improves things for sufficiently large .
Theorem 3.
We have that
[TABLE]
Proof.
Assume is sufficiently large. Let , . We shall consider .
Pick uniformly at random. Also, pick uniformly at random.
By Propositions 5.1 and 5.2, all of the following hold with positive probability (by a union bound):
- •
;
- •
;
- •
we have for each distinct .
We condition on such an outcome, and use this to construct our .
We define the maps so that and .
Consider . We take
[TABLE]
We call the first case of our definition the ‘global rule’, and our second case the ‘local rule’.
Obviously, always takes some value in , thus . We will prove that
[TABLE]
implying that by our assumptions on . Hence, we will be done after establishing Eq. 5.1.
Without further delay, we prove that is small. Consider any twins of . For , we define and . We similarly define by replacing ‘’ by ‘’.
As are twins, we must have that if and only if (otherwise one of will belong to and the other will belong to ).
It is clear that are non-decreasing, as they are obtained from apply the ceiling function to increasing sequences.
Next let . We define so that if and only if is the -th smallest element of . We record that these sets:
- •
partition (clear);
- •
are all intervals (as is non-decreasing);
- •
each have size at most (as is a -to-1 function).
Also, let .
Let . We will show that the indices contributed by each of the three parts is appropriately bounded.
Proposition 5.3**.**
We have and for .
Proof.
Since and we have .
It remains to show that implies that .
Supposing otherwise, we’d have for some and . Thus our local rule gives
[TABLE]
while
[TABLE]
Since is injective, we should have (as are twins), which implies that . But then are not disjoint (and hence not twins), contradiction.∎
Proposition 5.4**.**
We have and for .
Proof.
As previously noted, the bound holds for all , thus the second part is trivial.
Now, construct a graph on vertex set , with . Note that will not have any loops, since ensures that for .
Next, as are increaing sequences, we have that the connected components of are all paths. Thus there is is some matching (collection of disjoint edges with .
Take , which has size . We finish by showing that is a string-twin of , which immediately rearranges to give the desired bound (here follows for the fact that ).
Since is a matching, we have that are disjoint. For , write (respectively ) for the -th smallest element of (respectively ). Since implies and , we have that and . Whence, we have that for (as implies that ).
So, we conclude that is a string twin of , as desired. ∎
Proposition 5.5**.**
We have that and for .
Proof.
We first bound assuming . As , we have that . So the desired bound will follow from showing that always holds. Let . Also write .
By our local rule, we must have that for each . Whence, we see that (as and are both increasing).
It remains to bound . Due to our global rule, for , we must have that . Now we define . Clearly , as .
Finally, by repeating the argument from Proposition 5.4, we can find with so that is a string-twin of . Some minor rearranging gives that , completing the proof.∎
Finally, it is clear to see that Eq. 5.1 holds by combining Propositions 5.3, 5.4 and 5.5, along with the fact that . So we are done. ∎
We shall now prove the following upper bound, refuting a conjecture of Dudek, Grytczuk, and Ruzciński.
Theorem 4.
There exists some absolute constant such that
[TABLE]
Remark 5.6**.**
We have made no effort to optimize the constant given by our argument.
For those curious, one may use and in the below arguments, which shows that is attainable. By using sharper estimates and being less sloppy, one could probably show . However, proving likely requires new ideas.
Proof.
We will fix some sufficiently large integer . For each letter , we assign the weight .
We shall consider a random -ary string of length . For , let . Also, set .
Now given , we form a permutation of length . Specifically, for , we take
[TABLE]
(in other words, is the “skew-sum” where denote the decreasing permutation of length ).
For , let be the smallest such that . Also, let .
Now, corresponds to the 2-coloring where for , if and only if (recall the construction given in Lemma 2.1).
Now, , thus we wish to prove that for some , that for all twins of , we have . Thus, given a pair of twins , let . We shall argue that a.a.s., .
Given a set , we define a map by setting for .
We make a few remarks. Obviously for any , we have that is increasing (i.e., for ). Also, if are twins, then we must have that if and only if (by definition of ).
Now given a twin , we define a graph , where we will allow loops but won’t care about the multiplicity of edges. Specifically, will have vertex set and edge set . We observe that the components of are all either singletons, loops, or paths with some vertex set with edge set . Additionally, we observe that actually only depends on .
Thus, given , we write to denote the unique graph where is a twin (of ) with (or the empty graph if no such exist). Furthermore, let count the number of connected components .
Given subsets let be the event that there exists a twin of such that where . We will prove two bounds on .
Proposition 5.7**.**
Consider any .
We have that
[TABLE]
Proposition 5.8**.**
Consider any .
Suppose for some , and that is even. Then we have that
[TABLE]
We shall first show how this implies our desired result. We intend to do a union bound. Let be the set of pairs of with . Note that if is a twin, then . So, it suffices to prove
[TABLE]
Pick . We write , where
[TABLE]
It is clear that if , that , whence we see for all .
Meanwhile, as , we can apply Proposition 5.7 to get
[TABLE]
assuming .
Finally, for , we claim that is large. Indeed, for any we have that
[TABLE]
(and equality holds on the RHS when lacks loops). Thus as , it follows that . Take .
Whence, we have that
[TABLE]
(since there are at most sets of size at most which can be the complement of a set having cardinality at least ). By Stirling’s approximation, we get that
[TABLE]
for large . Meanwhile, we have that implies that
[TABLE]
Thus, taking , we have that
[TABLE]
Thus
[TABLE]
When , we have that and . As this happens, we have and . Thus, for small we have that as desired.
It remains to establish our claims. Before doing so, we establish some notation.
Given , we write to be the minimum of over all twins with . Clearly, is the indicator function of the event that .
Next, given a graph , we write to denote the set of connected components of . For any component (which is either a path with edges, or a single looped vertex), we write for the minimum of over all with .
Inspecting our definitions, we get the following useful facts.
Proposition 5.9**.**
Consider .
We have that
[TABLE]
Furthermore, the set of random variables are independent over the randomness of .
Proof.
To see the first part, we note
[TABLE]
and for each such we have that
[TABLE]
We now establish the second part. For each , we observe that depends only on . Whence, independence immediately follows, as the components are all vertex-disjoint. ∎
Remark 5.10**.**
It is not too hard to see that in fact , since we can optimize the intersection of with each component without any issues. We omit the details since we do not need an upper bound for .
Without further ado, we prove our claims.
Proof of Proposition 5.7.
Let . We want to upper bound .
By Proposition 5.9, we have that
[TABLE]
where the LHS is the sum of independent random variables.
We will show that for any component , that
[TABLE]
Assuming this, a quick union bound gives our desired result. Indeed,
[TABLE]
(here the first inequality is because each takes non-negative integer values).
It remains to prove that Inequality 5.2 holds.
Fix any component . We consider two cases.
Case 1 (): Here, we show always holds, which is clearly more than sufficient. Let . Consider any twin with . We will argue that , which implies as claimed.
Let . Observe must hold (otherwise, would have an edge from to some , contradicting that is a component of ). So, it follows that .
Meanwhile, as are twins, they are disjoint, and so is even. Meanwhile, is odd, thus cannot hold, implying as desired.
Case 2 (): Let . As noted before, we have that . We will prove that if , then . Consequently, this means , as desired.
Consider any twin with . Also, suppose .
For , let . Also, WLOG assume . Since , we have that for . Furthermore, we must have that , so that (otherwise, or would have additional edges).
Now suppose for sake of contradiction that . Then, we must have that for all .
We shall see inductively that yet for . Consequently, since , the condition will imply , which can only happen if (as they are both powers of 3), contradiction.
It remains to establish the desired divisibility conditions, namely that and for .
Recalling that we’re assuming for all , and that , we must have that . Thus the divisibility condition is satisfied.
Now, suppose we have the divisibility condition for some . We shall deduce it also holds for .
By assumption, we have that . Also, since we have that . In particular, we have that (here the last inequality is a consequence of our divisibility conditions and the fact that ). But then, we have that (as is a power of 3). Thus, recalling , we see that , which tells us that and as desired.∎
Proof of Proposition 5.8.
Let . Suppose for some .
We will show that for any component , that stochastically dominates111We write to denote the binomial distribution with trials each with parameter . . Whence, it will follow by Proposition 5.9 that
[TABLE]
which stochastically dominates
[TABLE]
Noting
[TABLE]
we have that and thus . Thus by the stochastic domination above and a Chernoff bound, we see that
[TABLE]
(the last inequality follows from the fact that for implies that ).
It remains to prove that stochastically dominates .
If then there is nothing to prove, so we may suppose that is a path. Let .
For , let . Let . For , we define the interval . It is clear that for distinct that are disjoint.
For , let be the event that . It is not hard to see that the events are independent, as they are determined by disjoint sets of random variables that are all independent of one another. Furthermore, as we have assumed is even, it is clear that for all .
Consequently, is is distributed identically to . So now it suffices to show
[TABLE]
The above is immediate corollary of the following fact, completeing our proof.
Claim 5.11**.**
Consider . Suppose that .
Then for any twin with , we have .
Proof.
Consider any such . For , set . WLOG, we may assume that .
Now assuming , implying
[TABLE]
Consequently, must be non-empty, implying as desired. ∎
∎
∎
6. Conclusion
There are a number of questions one can ask about this regime. Here are some problems we believe to be interesting.
Let
[TABLE]
(presumably, the limit is still well-defined if we replace ‘’ with ‘’, but we don’t think this matter is especially important). Theorems 1 and 3 proved that
[TABLE]
This naturally begs the question:
Question 1**.**
How does grow (as )?
We are inclined to believe that our upper bound is closer to the truth. Indeed, it already seems difficult to create a coloring without a twin where:
- •
for all ;
- •
for every interval of length , we have .
(Here the second condition implies that , while the first condition is just to restrict our attention to “local strategies” for finding twins.)
We remark that it is still open to determine the analogous limit for -ary strings (the best lower bound is that there exist string-twins of length ).
Here are some generalized problems we didn’t know how to answer.
We say that a twin is non-crossing if . Define to be the longest non-crossing twin in , and .
Question 2**.**
Is for some ?
We believe the answer is yes even for . Such a result is equivalent to proving that for every , that when is sufficiently large, there exists such that for , we have that for any with , that there is some where .
Finally, rather than edge-coloring , we could consider edge-colorings of (the complete -vertex hypergraph of uniformity ). Here, one can say a twin is a pair of disjoint subsets such that for all . Define in the natural way, and let .
Our methods can be extended to prove that for all . We only provide a sketch, as our bounds seem far from tight (roughly inverse tower-type). The idea is to generalize Lemma 3.2, by considering -edge-colorings of a complete -partite hypergraph with parts with and taking (here means is sufficiently large with respect to ). We’d say is -popular if there exists distinct such that . By pigeonhole, there exists some color such that at least of the -tuples are -popular. Then, using something like dependent random choice we can ensure that there exists so that with each being -popular. One should then be able imitate the proof of Theorem 1 and show that is bounded away from zero.
Already for , it would be nice to understand how things work.
Question 3**.**
Do we have ?
Lastly, it could be nice to consider an intermediate problem. Given positive integers , let be the largest such that for every , there exists where for all . It is clear that , and it would be nice to explore whether we can obtain better bounds in this specialized setting.
Question 4**.**
Do we have ?
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Axenovich, Y. Person, and S. Puzynina, A regularity lemma and twins in words, in Journal of Combinatorial Theory, Series A 120 (4) (2013), p. 733-743.
- 2[2] B. Bukh and V. Guruswami, An improved bound on the fraction of correctable deletions, in SODA (2016), p. 1893-1901.
- 3[3] B. Bukh, V. Guruswami, and J. Håstad, An improved bound on the fraction of correctable deletions, in IEEE Transactions on Information Theory 63 (2017), p. 93-103.
- 4[4] B. Bukh and L. Zhou, Twins in words and long common subsequences in permutations, in Israel Journal of Mathematics 213 (2016), p. 183-209.
- 5[5] A. Dudek, J. Grytczuk, and A. Ruciński, On weak twins and up-and-down subpermutations, in Integers 21A (2021), Ron Graham Memorial Volume, Paper No. A 10, 17 pp.
