Recognizing Union-Find trees built up using union-by-rank strategy is NP-complete
Kitti Gelle, Szabolcs Ivan

TL;DR
This paper proves that recognizing whether a given tree with rank info is a Union-Find tree built with union-by-rank is NP-complete, providing a new structural characterization and extending previous results.
Contribution
It introduces a simple push operation for characterizing Union-Find trees with union-by-rank and proves the recognition problem is NP-complete.
Findings
Recognition of Union-Find trees with union-by-rank is NP-complete.
Provides a structural characterization using a push operation.
Extends previous NP-completeness results for union-by-size strategy.
Abstract
Disjoint-Set forests, consisting of Union-Find trees, are data structures having a widespread practical application due to their efficiency. Despite them being well-known, no exact structural characterization of these trees is known (such a characterization exists for Union trees which are constructed without using path compression) for the case assuming union-by-rank strategy for merging. In this paper we provide such a characterization by means of a simple push operation and show that the decision problem whether a given tree (along with the rank info of its nodes) is a Union-Find tree is NP-complete, complementing our earlier similar result for the union-by-size strategy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: Department of Computer Science, University of Szeged, Hungary
11email: {kgelle,szabivan}@inf.u-szeged.hu
Recognizing Union-Find trees built up using union-by-rank strategy is -complete††thanks: Research was supported by the NKFI grant no. 108448.
Kitti Gelle
Szabolcs Iván
Abstract
Disjoint-Set forests, consisting of Union-Find trees, are data structures having a widespread practical application due to their efficiency. Despite them being well-known, no exact structural characterization of these trees is known (such a characterization exists for Union trees which are constructed without using path compression) for the case assuming union-by-rank strategy for merging. In this paper we provide such a characterization by means of a simple push operation and show that the decision problem whether a given tree (along with the rank info of its nodes) is a Union-Find tree is -complete, complementing our earlier similar result for the union-by-size strategy.
1 Introduction
Disjoint-Set forests, introduced in [10], are fundamental data structures in many practical algorithms where one has to maintain a partition of some set, which supports three operations: creating a partition consisting of singletons, querying whether two given elements are in the same class of the partition (or equivalently: finding a representative of a class, given an element of it) and merging two classes. Practical examples include e.g. building a minimum-cost spanning tree of a weighted graph [4], unification algorithms [18] etc.
To support these operations, even a linked list representation suffices but to achieve an almost-constant amortized time cost per operation, Disjoint-Set forests are used in practice. In this data structure, sets are represented as directed trees with the edges directed towards the root; the create operation creates trees having one node each (here stands for the number of the elements in the universe), the find operation takes a node and returns the root of the tree in which the node is present (thus the operation is implemented as ), and the operation is implemented by merging the trees containing and , i.e. making one of the root nodes to be a child of the other root node (if the two nodes are in different classes).
In order to achieve near-constant efficiency, one has to keep the (average) height of the trees small. There are two “orthogonal” methods to do that: first, during the merge operation it is advisable to attach the “smaller” tree below the “larger” one. If the “size” of a tree is the number of its nodes, we say the trees are built up according to the union-by-size strategy, if it’s the depth of a tree, then we talk about the union-by-rank strategy. Second, during a find operation invoked on some node of a tree, one can apply the path compression method, which reattaches each ancestor of directly to the root of the tree in which they are present. If one applies both the path compression method and either one of the union-by-size or union-by-rank strategies, then any sequence of operations on a universe of elements has worst-case time cost where is the inverse of the extremely fast growing (not primitive recursive) Ackermann function for which for each practical value of (say, below ), hence it has an amortized almost-constant time cost [23]. Since it’s proven [9] that any data structure maintaining a partition has worst-case time cost , the Disjoint-Set forests equipped with a strategy and path compression offer a theoretically optimal data structure which performs exceptionally well also in practice. For more details see standard textbooks on data structures, e.g. [4].
Due to these facts, it is certainly interesting both from the theoretical as well as the practical point of view to characterize those trees that can arise from a forest of singletons after a number of merge and find operations, which we call Union-Find trees in this paper. One could e.g. test Disjoint-Set implementations since if at any given point of execution a tree of a Disjoint-Set forest is not a valid Union-Find tree, then it is certain that there is a bug in the implementation of the data structure (though we note at this point that this data structure is sometimes regarded as one of the “primitive” data structures, in the sense that it is possible to implement a correct version of them that needs not be certifying [21]). Nevertheless, only the characterization of Union trees is known up till now [2], i.e. which correspond to the case when one uses one of the union-by- strategies but not path compression. Since in that case the data structure offers only a theoretic bound of on the amortized time cost, in practice all implementations imbue path compression as well, so for a characterization to be really useful, it has to cover this case as well.
In this paper we show that the recognition problem of Union-Find trees is -complete when the union-by-rank strategy is used, complementing our earlier results [13] where we proved -completeness for the union-by-size strategy. The proof method applied here resembles to that one, but the low-level details for the reduction (here we use the Partition problem, there we used the more restricted version as this is a very canonical strongly -complete problem) differ greatly. This result also confirms the statement from [2] that the problem “seems to be much harder” than recognizing Union trees. As (up to our knowledge) in most of the actual software libraries having this data structure implemented the union-by-rank strategy is used (apart from the cases when one quickly has to query the size of the sets as well), for software testing purposes the current result is more relevant than the one applying union-by-size strategy.
Related work. There is an increasing interest in determining the complexity of the recognition problem of various data structures. The problem was considered for suffix trees [17, 22], (parametrized) border arrays [15, 20, 8, 14, 16], suffix arrays [1, 7, 19], KMP tables [6, 12], prefix tables [3], cover arrays [5], and directed acyclic word- and subsequence graphs [1].
2 Notation
A (ranked) tree is a tuple with being the finite set of its nodes, its root, mapping a nonnegative integer to each node, and mapping each nonroot node to its parent (so that the graph of is a directed acyclic graph, with edges being directed towards the root). We require for each nonroot node , i.e. the rank strictly decreases towards the leaves.
For a tree and a node , let stand for the set of its children and stand as a shorthand for , the set of depth-one nodes of . Also, let denote that is a (non-strict) ancestor of in , i.e. for some . For , let stand for the subtree of rooted at . As shorthand, let stand for , the rank of the root of .
Two operations on trees are that of merging and collapsing. Given two trees and with and being disjoint and , then their merge (in this order) is the tree with for , and for each nonroot node of , and
[TABLE]
and , resp. for each , resp.
Given a tree and a node , then is the tree with if is a nonroot ancestor of in and otherwise. For examples, see Figure 1.
Observe that both operations indeed construct a ranked tree (e.g. the rank remains strictly decreasing towards the leaves).
We say that a tree is a singleton tree if it has exactly one node, and this node has rank [math].
The class of Union trees is the least class of trees satisfying the following two conditions: every singleton tree is a Union tree, and if and are Union trees with , then is a Union tree as well.
Analogously, the class of Union-Find trees is the least class of trees satisfying the following three conditions: every singleton tree is a Union-Find tree, if and are Union-Find trees with , then is a Union-Find tree as well, and if is a Union-Find tree and is a node of , then is also a Union-Find tree.
We say that a node of a tree satisfies the Union condition if
[TABLE]
Then, the characterization of Union trees from [2] can be formulated in our terms as follows:
Theorem 2.1
A tree is a Union tree if and only if each node of satisfies the Union condition.
Note that the rank of a Union tree always coincides by its height. (And, any subtree of a Union tree is also a Union tree.) In particular, the leaves are exactly those nodes of rank [math].
3 Structural characterization of Union-Find trees
Suppose and are trees on the same set of nodes, with the same root root and the same rank function rank. We write if implies for each .
Clearly, is a partial order on any set of trees (i.e. is a reflexive, transitive and antisymmetric relation). It is also clear that if and only if holds for each which is further equivalent to requiring since cannot be .
Another notion we define is the (partial) operation push on trees as follows: when is a tree and are siblings in , i.e. have the same parent, and , then is defined as the tree with
[TABLE]
that is, we “push” the node one level deeper in the tree just below its former sibling . (See Figure 1.)
We write when for some and , and as usual, denotes the reflexive-transitive closure of .
Proposition 1
For any pair and of trees, the following conditions are equivalent:
- (i)
, 2. (ii)
there exists a sequence of trees such that for each we have for some depth-one node , moreover, and for each , 3. (iii)
.
Proof
i)ii). It is clear that is equality on singleton trees, thus implies for trees of rank [math]. Assume for the trees and and let stand for the set of the depth-one nodes of and stand for . Clearly, since by , any node of having depth at least two has to satisfy and since for such nodes, has to have depth at least two in as well. Now there are two cases: either for each , or for some .
If for each , then and we only have to show that for each . For this, let with . Since is a subtree of , this holds if and only if . From this implies , that is, , hence .
Now assume for some . Then , thus there exists some with . By , this is a member of as well, and , thus is well-defined. Moreover, since for each : either in which case by , or and then also holds. Thus, there exists a tree for some with ; since , by repeating this construction we eventually arrive to a tree with , implying by .
ii)iii). We apply induction on . When , then is a singleton tree and the condition in ii) ensures that is a singleton tree as well. Thus, and clearly .
Now let assume the claim holds for each pair of trees of rank less than and let be trees satisfying the condition. Then, by construction, . Since for each node , by we get applying the induction hypothesis that for each depth-one node of , thus , hence as well.
iii) i). For implying it suffices to show that implies since the latter is reflexive and transitive. So let and be siblings in with the common parent , and let . Then, since , we get , and by for each node , we have .
∎
The relations and are introduced due to their intimate relation to Union-Find and Union trees (similarly to the case of the union-by-size strategy [13], but there the push operation itself was slightly different):
Theorem 3.1
A tree is a Union-Find tree if and only if for some Union tree .
Proof
Let be a Union-Find tree. We show the claim by structural induction. For singleton trees the claim holds since any singleton tree is a Union tree as well. Suppose . Then by the induction hypothesis, and for the Union trees and . Then, for the tree we get that . Finally, assume for some node . Let be the ancestral sequence of in . Then, defining , we get that and for some Union tree applying the induction hypothesis, thus also holds.
Now assume (equivalently, ) for some Union tree . We show the claim by induction on the height of . For singleton trees the claim holds since any singleton tree is a Union-Find tree.
Now assume is a tree and for some Union tree . Then by Proposition 1, there is a set of depth-one nodes of and a function with such that for the sequence , we have that for each . As each is a Union tree (since so is ), we have by the induction hypothesis that each is a Union-Find tree. Now let be ordered nondecreasingly by rank; then, as is a Union tree and , we get that by Theorem 2.1. Hence for the sequence defined as being a singleton tree with root root and for each , , we get that is a Union-Find tree. Finally, we get from by applying successively one collapse operaton on each node in , thus is a Union-Find tree as well. ∎
4 Complexity
In order to show -completeness of the recognition problem, we first make a useful observation.
Proposition 2
In any Union-Find tree there are at least as many rank-[math] nodes as nodes of positive rank.
Proof
We apply induction on the structure of . The claim holds for singleton trees (having one single node of rank [math]). Let and suppose the claim holds for and . There are two cases.
- •
Assume . Then, since we have that is [math] as well, i.e. both and are singleton trees (of rank [math]). In this case has one node of rank and one node of rank [math].
- •
If , then (since is the only node in whose rank can change at all, in which case it increases) neither the total number of rank-[math] nodes nor the total number of nodes with positive rank changes, thus the claim holds.
Let and assume the claim holds for . Then, since the collapse operation does not change the rank of any of the nodes, the claim holds for as well. ∎
In order to define a reduction from the strongly -complete problem Partition we introduce several notions on trees:
An apple of weight for an integer is a tree consisting of a root node of rank , a depth-one node of rank [math] and depth-one nodes of rank .
A basket of size for an integer is a tree consisting of nodes: the root node having rank , depth-one children of rank [math] and one depth-one child of rank , which in turn has a child of rank [math].
A flat tree is a tree of the following form: the root of has rank . The immediate subtrees of are:
- •
a node of rank [math], having no children;
- •
a node of rank , having a single child of rank [math];
- •
a node of rank , having two children: a single node of rank [math] and a node of rank , having a single child of rank [math];
- •
an arbitrary number of apples,
- •
and an arbitrary number of baskets for some fixed size .
(See Figure 2.)
At this point we recall that the following problem Partition is -complete in the strong sense [11]: given a list of positive integers and a value such that the value is an integer, does there exist a partition of the set satisfying for each ?
(Here “in the strong sense” means that the problem remains -complete even if the numbers are encoded in unary.)
Proposition 3
Assume is a flat tree having basket children, each having the size , and apple children of weights respectively, satisfying .
Then is a Union-Find tree if and only if the instance is a positive instance of the Partition problem.
Proof
(For an example, the reader is referred to Figure 3.)
Suppose is a positive instance of the Partition problem. Let stand for the target sum . Let be a solution of , i.e., for each . Let be the nodes corresponding to the baskets of and let be the nodes corresponding to the apples of .
We define the following sequence of trees: and for each , let with being the unique index with . Then, consists of and the three additional nodes having rank [math], and . Note that the subtrees rooted at the latter three nodes are Union trees. Thus, if each of the trees is a Union-Find tree, then so is .
Consider a subtree . By construction, is a tree whose root has rank and has
- •
children of rank [math],
- •
a single child of rank , having a child of rank [math],
- •
and several (say, ) apple children with total weight .
We give a method to transform into a Union tree. First, we push rank-[math] nodes to each apple child of weight . After this stage has one child of rank [math], one child of rank and “filled” apple children, having a root of rank , thus the root of the transformed satisfies the Union condition. We only have to show that each of these “filled” apples is a Union-Find tree.
Such a subtree has a root node of rank , depth-one nodes of rank and depth-one nodes of rank [math]. Then, one can push into each node of rank a node of rank [math] and arrive to a tree with one depth-one node of rank [math], and depth-one nodes of rank , each having a single child of rank [math], which is indeed a Union tree, showing the claim by Theorem 3.1.
For an illustration of the construction the reader is referred to Figure 4.
For the other direction, suppose is a Union-Find tree. By Theorem 3.1 and Proposition 1, there is a subset and a mapping with such that for the sequence , we have that each immediate subtree of is a Union-Find tree and moreover, the root of satisfies the Union condition.
The root of has rank , has to have at least one child having rank [math], , and respectively. Since has exactly one child with rank [math] and rank , these nodes has to be in . This implies that no node gets pushed into the apples at this stage (because the apples have rank ). Thus, since the apples are not Union-Find trees (as they have strictly less rank-[math] nodes than positive-rank nodes, cf. Proposition 2), all the apples have to be in . Apart from the apples, has exactly one depth-one node of rank (which happens to be a root of a Union tree), thus this node has to stay in as well. Moreover, we cannot push the baskets as they have the maximal rank , hence they cannot be pushed.
Thus, we have to push all the apples, and we can push apples only into baskets (as exactly the baskets have rank greater than ). Let be a basket node, let stand for and let be the set of those apples that get pushed into during the operation. Then, the total number of nodes having rank [math] in is ( of them coming from the apples and the other ones coming from the basket) while the total number of nodes having a positive rank is where is the total weight of the apples in . Applying Proposition 2 we get that for each basket. Since the total weight of all apples is and each apple gets pushed into exactly one basket, we get that actually holds for each basket. Thus, is a positive instance of the Partition problem. ∎
Theorem 4.1
The recognition problem of Union-Find trees is -complete.
Proof
By Proposition 3 we get -hardness. For membership in , we make use of the characterization given in Theorem 3.1 and that the possible number of pushes is bounded above by : upon pushing below , the depth of and its descendants increases, while the depth of the other nodes remains the same. Since the depth of any node is at most , the sum of the depths of all the nodes is at most in any tree. Hence, it suffices to guess nondeterministically a sequence for some with being a Union tree (which also can be checked in polynomial time). ∎
5 Conclusion, future directions
We have shown that unless , there is no efficient algorithm to check whether a given tree is a valid Union-Find tree, assuming union-by-rank strategy, since the problem is -complete, complementing our earlier results assuming union-by-size strategy. A very natural question is the following: does there exist a merging strategy under which the time complexity remains amortized almost-constant, and at the same time allows an efficient recognition algorithm? Although this data structure is called “primitive” in the sense that it does not really need an automatic run-time certifying system, but we find the question to be also interesting from the mathematical point of view as well. It would be also an interesting question whether the recognition problem of Union-Find trees built up according to the union-by-rank strategy is still -complete if the nodes of the tree are not tagged with the rank, that is, given a tree without rank info, does there exist a Union-Find tree with the same underlying tree?
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Hideo Bannai, Shunsuke Inenaga, Ayumi Shinohara, and Masayuki Takeda. Inferring strings from graphs and arrays. In Branislav Rovan and Peter Vojtáš, editors, Mathematical Foundations of Computer Science 2003: 28th International Symposium, MFCS 2003 , pages 208–217. Springer Berlin Heidelberg, Berlin, Heidelberg, 2003.
- 2[2] Leizhen Cai. The recognition of Union trees. Inf. Process. Lett. , 45(6):279–283, 1993.
- 3[3] Julien Clément, Maxime Crochemore, and Giuseppina Rindone. Reverse engineering prefix tables. In Susanne Albers and Jean-Yves Marion, editors, 26th International Symposium on Theoretical Aspects of Computer Science, STACS 2009 , volume 3 of LIP Ics , pages 289–300. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany, 2009.
- 4[4] Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson. Introduction to Algorithms . Mc Graw-Hill Higher Education, 2nd edition, 2001.
- 5[5] Maxime Crochemore, Costas S. Iliopoulos, Solon P. Pissis, and German Tischler. Cover array string reconstruction. In Amihood Amir and Laxmi Parida, editors, Combinatorial Pattern Matching: 21st Annual Symposium, CPM 2010 , pages 251–259, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
- 6[6] J.-P. Duval, T. Lecroq, and A. Lefebvre. Efficient validation and construction of border arrays and validation of string matching automata. RAIRO - Theoretical Informatics and Applications , 43(2):281–297, 2009.
- 7[7] J.-P. Duval and A. Lefebvre. Words over an ordered alphabet and suffix permutations. Theoretical Informatics and Applications , 36(3):249–259, 2002.
- 8[8] Jean-Pierre Duval, Thierry Lecroq, and Arnaud Lefebvre. Border array on bounded alphabet. J. Autom. Lang. Comb. , 10(1):51–60, 2005.
