Recognizing Union-Find trees built up using union-by-rank strategy is   NP-complete

Kitti Gelle; Szabolcs Ivan

arXiv:1704.07254·cs.DS·April 25, 2017

Recognizing Union-Find trees built up using union-by-rank strategy is NP-complete

Kitti Gelle, Szabolcs Ivan

PDF

TL;DR

This paper proves that recognizing whether a given tree with rank info is a Union-Find tree built with union-by-rank is NP-complete, providing a new structural characterization and extending previous results.

Contribution

It introduces a simple push operation for characterizing Union-Find trees with union-by-rank and proves the recognition problem is NP-complete.

Findings

01

Recognition of Union-Find trees with union-by-rank is NP-complete.

02

Provides a structural characterization using a push operation.

03

Extends previous NP-completeness results for union-by-size strategy.

Abstract

Disjoint-Set forests, consisting of Union-Find trees, are data structures having a widespread practical application due to their efficiency. Despite them being well-known, no exact structural characterization of these trees is known (such a characterization exists for Union trees which are constructed without using path compression) for the case assuming union-by-rank strategy for merging. In this paper we provide such a characterization by means of a simple push operation and show that the decision problem whether a given tree (along with the rank info of its nodes) is a Union-Find tree is NP-complete, complementing our earlier similar result for the union-by-size strategy.

Equations6

\textsc r ank (\textsc r oo t_{t}) = {\textsc r ank (t) \textsc r ank (t) + 1 if \textsc r ank (s) < \textsc r ank (t), otherwise,

\textsc r ank (\textsc r oo t_{t}) = {\textsc r ank (t) \textsc r ank (t) + 1 if \textsc r ank (s) < \textsc r ank (t), otherwise,

{\textsc r ank_{t} (y) : y \in \textsc c hi l d r e n (t, x)} = {0, 1, \dots, \textsc r ank_{t} (x) - 1} .

{\textsc r ank_{t} (y) : y \in \textsc c hi l d r e n (t, x)} = {0, 1, \dots, \textsc r ank_{t} (x) - 1} .

\textsc p a r e n t^{'} (z) = {y \textsc p a r e n t_{t} (z) if z = x, otherwise,

\textsc p a r e n t^{'} (z) = {y \textsc p a r e n t_{t} (z) if z = x, otherwise,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11institutetext: Department of Computer Science, University of Szeged, Hungary

11email: {kgelle,szabivan}@inf.u-szeged.hu

Recognizing Union-Find trees built up using union-by-rank strategy is $\mathbf{NP}$ -complete††thanks: Research was supported by the NKFI grant no. 108448.

Kitti Gelle

Szabolcs Iván

Abstract

Disjoint-Set forests, consisting of Union-Find trees, are data structures having a widespread practical application due to their efficiency. Despite them being well-known, no exact structural characterization of these trees is known (such a characterization exists for Union trees which are constructed without using path compression) for the case assuming union-by-rank strategy for merging. In this paper we provide such a characterization by means of a simple push operation and show that the decision problem whether a given tree (along with the rank info of its nodes) is a Union-Find tree is $\mathbf{NP}$ -complete, complementing our earlier similar result for the union-by-size strategy.

1 Introduction

Disjoint-Set forests, introduced in [10], are fundamental data structures in many practical algorithms where one has to maintain a partition of some set, which supports three operations: creating a partition consisting of singletons, querying whether two given elements are in the same class of the partition (or equivalently: finding a representative of a class, given an element of it) and merging two classes. Practical examples include e.g. building a minimum-cost spanning tree of a weighted graph [4], unification algorithms [18] etc.

To support these operations, even a linked list representation suffices but to achieve an almost-constant amortized time cost per operation, Disjoint-Set forests are used in practice. In this data structure, sets are represented as directed trees with the edges directed towards the root; the create operation creates $n$ trees having one node each (here $n$ stands for the number of the elements in the universe), the find operation takes a node and returns the root of the tree in which the node is present (thus the $\textsc{same-class}(x,y)$ operation is implemented as $\textsc{find}(x)==\textsc{find}(y)$ ), and the $\textsc{merge}(x,y)$ operation is implemented by merging the trees containing $x$ and $y$ , i.e. making one of the root nodes to be a child of the other root node (if the two nodes are in different classes).

In order to achieve near-constant efficiency, one has to keep the (average) height of the trees small. There are two “orthogonal” methods to do that: first, during the merge operation it is advisable to attach the “smaller” tree below the “larger” one. If the “size” of a tree is the number of its nodes, we say the trees are built up according to the union-by-size strategy, if it’s the depth of a tree, then we talk about the union-by-rank strategy. Second, during a find operation invoked on some node $x$ of a tree, one can apply the path compression method, which reattaches each ancestor of $x$ directly to the root of the tree in which they are present. If one applies both the path compression method and either one of the union-by-size or union-by-rank strategies, then any sequence of $m$ operations on a universe of $n$ elements has worst-case time cost $O(m\alpha(n))$ where $\alpha$ is the inverse of the extremely fast growing (not primitive recursive) Ackermann function for which $\alpha(n)\leq 5$ for each practical value of $n$ (say, below $2^{65535}$ ), hence it has an amortized almost-constant time cost [23]. Since it’s proven [9] that any data structure maintaining a partition has worst-case time cost $\Omega(m\alpha(n))$ , the Disjoint-Set forests equipped with a strategy and path compression offer a theoretically optimal data structure which performs exceptionally well also in practice. For more details see standard textbooks on data structures, e.g. [4].

Due to these facts, it is certainly interesting both from the theoretical as well as the practical point of view to characterize those trees that can arise from a forest of singletons after a number of merge and find operations, which we call Union-Find trees in this paper. One could e.g. test Disjoint-Set implementations since if at any given point of execution a tree of a Disjoint-Set forest is not a valid Union-Find tree, then it is certain that there is a bug in the implementation of the data structure (though we note at this point that this data structure is sometimes regarded as one of the “primitive” data structures, in the sense that it is possible to implement a correct version of them that needs not be certifying [21]). Nevertheless, only the characterization of Union trees is known up till now [2], i.e. which correspond to the case when one uses one of the union-by- strategies but not path compression. Since in that case the data structure offers only a theoretic bound of $\Theta(\log n)$ on the amortized time cost, in practice all implementations imbue path compression as well, so for a characterization to be really useful, it has to cover this case as well.

In this paper we show that the recognition problem of Union-Find trees is $\mathbf{NP}$ -complete when the union-by-rank strategy is used, complementing our earlier results [13] where we proved $\mathbf{NP}$ -completeness for the union-by-size strategy. The proof method applied here resembles to that one, but the low-level details for the reduction (here we use the Partition problem, there we used the more restricted version $3-\textsc{Partition}$ as this is a very canonical strongly $\mathbf{NP}$ -complete problem) differ greatly. This result also confirms the statement from [2] that the problem “seems to be much harder” than recognizing Union trees. As (up to our knowledge) in most of the actual software libraries having this data structure implemented the union-by-rank strategy is used (apart from the cases when one quickly has to query the size of the sets as well), for software testing purposes the current result is more relevant than the one applying union-by-size strategy.

Related work. There is an increasing interest in determining the complexity of the recognition problem of various data structures. The problem was considered for suffix trees [17, 22], (parametrized) border arrays [15, 20, 8, 14, 16], suffix arrays [1, 7, 19], KMP tables [6, 12], prefix tables [3], cover arrays [5], and directed acyclic word- and subsequence graphs [1].

2 Notation

A (ranked) tree is a tuple $t=(V_{t},{\textsc{root}}_{t},{\textsc{rank}}_{t},{\textsc{parent}}_{t})$ with $V_{t}$ being the finite set of its nodes, ${\textsc{root}}_{t}\in V_{t}$ its root, ${\textsc{rank}}_{t}:V_{t}\to{\mathbb{N}_{0}}$ mapping a nonnegative integer to each node, and ${\textsc{parent}}_{t}:(V_{t}-\{{\textsc{root}}_{t}\})\to V_{t}$ mapping each nonroot node to its parent (so that the graph of ${\textsc{parent}}_{t}$ is a directed acyclic graph, with edges being directed towards the root). We require ${\textsc{rank}}_{t}(x)<{\textsc{rank}}_{t}({\textsc{parent}}_{t}(x))$ for each nonroot node $x$ , i.e. the rank strictly decreases towards the leaves.

For a tree $t$ and a node $x\in V_{t}$ , let ${\textsc{children}}(t,x)$ stand for the set $\{y\in V_{t}:{\textsc{parent}}_{t}(y)=x\}$ of its children and ${\textsc{children}}(t)$ stand as a shorthand for ${\textsc{children}}(t,{\textsc{root}}_{t})$ , the set of depth-one nodes of $t$ . Also, let $x\preceq_{t}y$ denote that $x$ is a (non-strict) ancestor of $y$ in $t$ , i.e. $x={\textsc{parent}}_{t}^{k}(y)$ for some $k\geq 0$ . For $x\in V_{t}$ , let $t|_{x}$ stand for the subtree $(V_{x}=\{y\in V:x\preceq_{t}y\},x,{\textsc{rank}}_{t}|_{V_{x}},{\textsc{parent}}_{t}|_{V_{x}})$ of $t$ rooted at $x$ . As shorthand, let ${\textsc{rank}}(t)$ stand for ${\textsc{rank}}_{t}({\textsc{root}}_{t})$ , the rank of the root of $t$ .

Two operations on trees are that of merging and collapsing. Given two trees $t=(V_{t},{\textsc{root}}_{t},{\textsc{rank}}_{t},{\textsc{parent}}_{t})$ and $s=(V_{s},{\textsc{root}}_{s},{\textsc{rank}}_{s},{\textsc{parent}}_{s})$ with $V_{t}$ and $V_{s}$ being disjoint and ${\textsc{rank}}(t)\geq{\textsc{rank}}(s)$ , then their merge $\textsc{merge}(t,s)$ (in this order) is the tree $(V_{t}\cup V_{s},{\textsc{root}}_{t},{\textsc{rank}},{\textsc{parent}})$ with ${\textsc{parent}}(x)={\textsc{parent}}_{t}(x)$ for $x\in V_{t}$ , ${\textsc{parent}}({\textsc{root}}_{s})={\textsc{root}}_{t}$ and ${\textsc{parent}}(y)={\textsc{parent}}_{s}(y)$ for each nonroot node $y\in V_{s}$ of $s$ , and

[TABLE]

and ${\textsc{rank}}(x)={\textsc{rank}}_{t}(x)$ , ${\textsc{rank}}_{s}(x)$ resp. for each $x\in V_{t}-\{{\textsc{root}}_{r}\}$ , $x\in V_{s}$ resp.

Given a tree $t=(V,{\textsc{root}},{\textsc{rank}},{\textsc{parent}})$ and a node $x\in V$ , then $\textsc{collapse}(t,x)$ is the tree $(V,{\textsc{root}},{\textsc{rank}},{\textsc{parent}}^{\prime})$ with ${\textsc{parent}}^{\prime}(y)={\textsc{root}}$ if $y$ is a nonroot ancestor of $x$ in $t$ and ${\textsc{parent}}^{\prime}(y)={\textsc{parent}}(y)$ otherwise. For examples, see Figure 1.

Observe that both operations indeed construct a ranked tree (e.g. the rank remains strictly decreasing towards the leaves).

We say that a tree is a singleton tree if it has exactly one node, and this node has rank [math].

The class of Union trees is the least class of trees satisfying the following two conditions: every singleton tree is a Union tree, and if $t$ and $s$ are Union trees with ${\textsc{rank}}(t)\geq{\textsc{rank}}(s)$ , then $\textsc{merge}(t,s)$ is a Union tree as well.

Analogously, the class of Union-Find trees is the least class of trees satisfying the following three conditions: every singleton tree is a Union-Find tree, if $t$ and $s$ are Union-Find trees with ${\textsc{rank}}(t)\geq{\textsc{rank}}(s)$ , then $\textsc{merge}(t,s)$ is a Union-Find tree as well, and if $t$ is a Union-Find tree and $x\in V_{t}$ is a node of $t$ , then $\textsc{collapse}(t,x)$ is also a Union-Find tree.

We say that a node $x$ of a tree $t$ satisfies the Union condition if

[TABLE]

Then, the characterization of Union trees from [2] can be formulated in our terms as follows:

Theorem 2.1

A tree $t$ is a Union tree if and only if each node of $t$ satisfies the Union condition.

Note that the rank of a Union tree always coincides by its height. (And, any subtree of a Union tree is also a Union tree.) In particular, the leaves are exactly those nodes of rank [math].

3 Structural characterization of Union-Find trees

Suppose $s$ and $t$ are trees on the same set $V$ of nodes, with the same root root and the same rank function rank. We write $s\preceq t$ if $x\preceq_{s}y$ implies $x\preceq_{t}y$ for each $x,y\in V$ .

Clearly, $\preceq$ is a partial order on any set of trees (i.e. is a reflexive, transitive and antisymmetric relation). It is also clear that $s\preceq t$ if and only if ${\textsc{parent}}_{s}(x)\preceq_{t}x$ holds for each $x\in V-\{{\textsc{root}}\}$ which is further equivalent to requiring ${\textsc{parent}}_{s}(x)\preceq_{t}{\textsc{parent}}_{t}(x)$ since ${\textsc{parent}}_{s}(x)$ cannot be $x$ .

Another notion we define is the (partial) operation push on trees as follows: when $t$ is a tree and $x\neq y\in V_{t}$ are siblings in $t$ , i.e. have the same parent, and ${\textsc{rank}}_{t}(x)<{\textsc{rank}}_{t}(y)$ , then ${\textsc{push}}(t,x,y)$ is defined as the tree $(V_{t},{\textsc{root}}_{t},{\textsc{rank}}_{t},{\textsc{parent}}^{\prime})$ with

[TABLE]

that is, we “push” the node $x$ one level deeper in the tree just below its former sibling $y$ . (See Figure 1.)

We write $t\vdash t^{\prime}$ when $t^{\prime}={\textsc{push}}(t,x,y)$ for some $x$ and $y$ , and as usual, $\vdash^{*}$ denotes the reflexive-transitive closure of $\vdash$ .

Proposition 1

For any pair $s$ and $t$ of trees, the following conditions are equivalent:

(i)

$s\preceq t$ , 2. (ii)

there exists a sequence $t_{0}=s,t_{1},t_{2},\ldots,t_{n}$ of trees such that for each $i=1,\ldots,n$ we have $t_{i}={\textsc{push}}(t_{i-1},x,y)$ for some depth-one node $x\in{\textsc{children}}(t_{i-1})$ , moreover, ${\textsc{children}}(t_{n})={\textsc{children}}(t)$ and $t_{n}|_{x}\preceq t|_{x}$ for each $x\in{\textsc{children}}(t)$ , 3. (iii)

$s\vdash^{*}t$ .

Proof

i) $\Rightarrow$ ii). It is clear that $\preceq$ is equality on singleton trees, thus $\preceq$ implies $\vdash^{*}$ for trees of rank [math]. Assume $s\preceq t$ for the trees $s=(V,{\textsc{root}},{\textsc{rank}},{\textsc{parent}})$ and $t=(V,{\textsc{root}},{\textsc{rank}},{\textsc{parent}}^{\prime})$ and let $X$ stand for the set ${\textsc{children}}(s)$ of the depth-one nodes of $s$ and $Y$ stand for ${\textsc{children}}(t)$ . Clearly, $Y\subseteq X$ since by $s\preceq t$ , any node $x$ of $s$ having depth at least two has to satisfy ${\textsc{parent}}(x)\preceq_{t}{\textsc{parent}}^{\prime}(x)$ and since ${\textsc{parent}}(x)\neq{\textsc{root}}$ for such nodes, $x$ has to have depth at least two in $t$ as well. Now there are two cases: either ${\textsc{root}}={\textsc{parent}}(x)={\textsc{parent}}^{\prime}(x)$ for each $x\in X$ , or ${\textsc{parent}}(x)\prec_{t}{\textsc{parent}}^{\prime}(x)$ for some $x\in X$ .

If ${\textsc{parent}}^{\prime}(x)={\textsc{root}}$ for each $x\in X$ , then $X=Y$ and we only have to show that $s|_{x}\preceq t|_{x}$ for each $x\in X$ . For this, let $u,v\in V(s|_{x})$ with $u\preceq_{s|_{x}}v$ . Since $s|_{x}$ is a subtree of $s$ , this holds if and only if $x\preceq_{s}u\preceq_{s}v$ . From $s\preceq t$ this implies $x\preceq_{t}u\preceq_{t}v$ , that is, $u\preceq_{t|_{x}}v$ , hence $s|_{x}\preceq t|_{x}$ .

Now assume ${\textsc{parent}}(x)\prec_{t}{\textsc{parent}}^{\prime}(x)$ for some $x\in X$ . Then ${\textsc{parent}}^{\prime}(x)\neq{\textsc{root}}$ , thus there exists some $y\in Y$ with $y\preceq_{t}{\textsc{parent}}^{\prime}(x)$ . By $Y\subseteq X$ , this $y$ is a member of $X$ as well, and ${\textsc{rank}}_{s}(y)={\textsc{rank}}_{t}(y)>{\textsc{rank}}_{t}(x)={\textsc{rank}}_{s}(x)$ , thus $s^{\prime}={\textsc{push}}(s,x,y)$ is well-defined. Moreover, $s^{\prime}\preceq t$ since ${\textsc{parent}}_{s^{\prime}}(z)\preceq_{t}z$ for each $z\in V$ : either $z\neq x$ in which case ${\textsc{parent}}_{s^{\prime}}(z)={\textsc{parent}}(z)\preceq_{t}z$ by $s\preceq t$ , or $z=x$ and then ${\textsc{parent}}_{s^{\prime}}(z)=y\preceq_{t}{\textsc{parent}}^{\prime}(x)\preceq_{t}x=z$ also holds. Thus, there exists a tree $s^{\prime}={\textsc{push}}(s,x,y)$ for some $x\in{\textsc{children}}(s)$ with $s^{\prime}\preceq t$ ; since ${\textsc{children}}(s^{\prime})=X-\{x\}$ , by repeating this construction we eventually arrive to a tree $t_{n}$ with $|{\textsc{children}}(t_{n})|=|Y|$ , implying ${\textsc{children}}(t_{n})=Y$ by $Y\subseteq{\textsc{children}}(t_{n})$ .

ii) $\Rightarrow$ iii). We apply induction on ${\textsc{rank}}(s)={\textsc{rank}}(t)$ . When ${\textsc{rank}}(s)=0$ , then $s$ is a singleton tree and the condition in ii) ensures that $t$ is a singleton tree as well. Thus, $s=t$ and clearly $s\vdash^{*}t$ .

Now let assume the claim holds for each pair of trees of rank less than ${\textsc{rank}}(s)$ and let $t_{0},\ldots,t_{n}$ be trees satisfying the condition. Then, by construction, $s\vdash^{*}t_{n}$ . Since ${\textsc{rank}}(t_{n}|_{x})<{\textsc{rank}}(t_{n})={\textsc{rank}}(s)$ for each node $x\in{\textsc{children}}(t_{n})$ , by $t_{n}|_{x}\preceq t|_{x}$ we get applying the induction hypothesis that $t_{n}|_{x}\vdash^{*}t|_{x}$ for each depth-one node $x$ of $t_{n}$ , thus $t_{n}\vdash^{*}t$ , hence $s\vdash^{*}t$ as well.

iii) $\Rightarrow$ i). For $\vdash^{*}$ implying $\preceq$ it suffices to show that $\vdash$ implies $\preceq$ since the latter is reflexive and transitive. So let $s=(V,r,{\textsc{rank}},{\textsc{parent}})$ and $x\neq y\in V$ be siblings in $s$ with the common parent $z$ , ${\textsc{rank}}(x)<{\textsc{rank}}(y)$ and let $t={\textsc{push}}(s,x,y)$ . Then, since ${\textsc{parent}}_{s}(x)=z={\textsc{parent}}_{t}(y)={\textsc{parent}}_{t}({\textsc{parent}}_{t}(x))$ , we get ${\textsc{parent}}_{s}(x)\preceq_{t}x$ , and by ${\textsc{parent}}_{s}(w)={\textsc{parent}}_{t}(w)$ for each node $w\neq x$ , we have $s\preceq t$ .

∎

The relations $\preceq$ and $\vdash^{*}$ are introduced due to their intimate relation to Union-Find and Union trees (similarly to the case of the union-by-size strategy [13], but there the push operation itself was slightly different):

Theorem 3.1

A tree $t$ is a Union-Find tree if and only if $t\vdash^{*}s$ for some Union tree $s$ .

Proof

Let $t$ be a Union-Find tree. We show the claim by structural induction. For singleton trees the claim holds since any singleton tree is a Union tree as well. Suppose $t={\textsc{merge}}(t_{1},t_{2})$ . Then by the induction hypothesis, $t_{1}\vdash^{*}s_{1}$ and $t_{2}\vdash^{*}s_{2}$ for the Union trees $s_{1}$ and $s_{2}$ . Then, for the tree $s={\textsc{merge}}(s_{1},s_{2})$ we get that $t\vdash^{*}s$ . Finally, assume $t=\textsc{collapse}(t^{\prime},x)$ for some node $x$ . Let $x=x_{1}\succ x_{2}\succ\ldots\succ x_{k}={\textsc{root}}_{t^{\prime}}$ be the ancestral sequence of $x$ in $t^{\prime}$ . Then, defining $t_{0}=t$ , $t_{i}={\textsc{push}}(t_{i-1},x_{i},x_{i+1})$ we get that $t\vdash^{*}t_{k-2}=t^{\prime}$ and $t^{\prime}\vdash^{*}s$ for some Union tree $s$ applying the induction hypothesis, thus $t\vdash^{*}s$ also holds.

Now assume $t\vdash^{*}s$ (equivalently, $t\preceq s$ ) for some Union tree $s$ . We show the claim by induction on the height of $t$ . For singleton trees the claim holds since any singleton tree is a Union-Find tree.

Now assume $t=(V,{\textsc{root}},{\textsc{rank}},{\textsc{parent}})$ is a tree and $t\vdash^{*}s$ for some Union tree $s$ . Then by Proposition 1, there is a set $X={\textsc{children}}(s)\subseteq{\textsc{children}}(t)$ of depth-one nodes of $t$ and a function $f:Y\to X$ with $Y=\{y_{1},\ldots,y_{\ell}\}={\textsc{children}}(t)-X$ such that for the sequence $t_{0}=t$ , $t_{i}={\textsc{push}}(t_{i-1},y_{i},f(y_{i}))$ we have that $t_{\ell}|_{x}\preceq s|_{x}$ for each $x\in X$ . As each $s|_{x}$ is a Union tree (since so is $s$ ), we have by the induction hypothesis that each $t_{\ell}|_{x}$ is a Union-Find tree. Now let $X=\{x_{1},\ldots,x_{k}\}$ be ordered nondecreasingly by rank; then, as $s$ is a Union tree and $X={\textsc{children}}(s)$ , we get that $\{{\textsc{rank}}(x_{i})\}=\{0,1,\ldots,{\textsc{rank}}({\textsc{root}})-1\}$ by Theorem 2.1. Hence for the sequence $t^{\prime}_{i}$ defined as $t^{\prime}_{0}$ being a singleton tree with root root and for each $i\in\{1,\ldots,k\}$ , $t^{\prime}_{i}={\textsc{merge}}(t^{\prime}_{i-1},t_{\ell}|_{x_{i}})$ , we get that $t_{\ell}=t^{\prime}_{k}$ is a Union-Find tree. Finally, we get $t$ from $t_{\ell}$ by applying successively one collapse operaton on each node in $Y$ , thus $t$ is a Union-Find tree as well. ∎

4 Complexity

In order to show $\mathbf{NP}$ -completeness of the recognition problem, we first make a useful observation.

Proposition 2

In any Union-Find tree $t$ there are at least as many rank-[math] nodes as nodes of positive rank.

Proof

We apply induction on the structure of $t$ . The claim holds for singleton trees (having one single node of rank [math]). Let $t={\textsc{merge}}(t_{1},t_{2})$ and suppose the claim holds for $t_{1}$ and $t_{2}$ . There are two cases.

•

Assume ${\textsc{rank}}(t_{1})=0$ . Then, since ${\textsc{rank}}(t_{1})\geq{\textsc{rank}}(t_{2})$ we have that ${\textsc{rank}}(t_{2})$ is [math] as well, i.e. both $t_{1}$ and $t_{2}$ are singleton trees (of rank [math]). In this case $t$ has one node of rank $1$ and one node of rank [math].

•

If ${\textsc{rank}}(t_{1})>0$ , then (since ${\textsc{root}}_{t_{1}}$ is the only node in $V_{t}=V_{t_{1}}\cup V_{t_{2}}$ whose rank can change at all, in which case it increases) neither the total number of rank-[math] nodes nor the total number of nodes with positive rank changes, thus the claim holds.

Let $t=\textsc{collapse}(s,x)$ and assume the claim holds for $s$ . Then, since the collapse operation does not change the rank of any of the nodes, the claim holds for $t$ as well. ∎

In order to define a reduction from the strongly $\mathbf{NP}$ -complete problem Partition we introduce several notions on trees:

An apple of weight $a$ for an integer $a>0$ is a tree consisting of a root node of rank $2$ , a depth-one node of rank [math] and $a$ depth-one nodes of rank $1$ .

A basket of size $H$ for an integer $H>0$ is a tree consisting of $H+4$ nodes: the root node having rank $3$ , $H+1$ depth-one children of rank [math] and one depth-one child of rank $1$ , which in turn has a child of rank [math].

A flat tree is a tree $t$ of the following form: the root of $t$ has rank $4$ . The immediate subtrees of $t$ are:

•

a node of rank [math], having no children;

•

a node of rank $1$ , having a single child of rank [math];

•

a node of rank $2$ , having two children: a single node of rank [math] and a node of rank $1$ , having a single child of rank [math];

•

an arbitrary number of apples,

•

and an arbitrary number of baskets for some fixed size $H$ .

(See Figure 2.)

At this point we recall that the following problem Partition is $\mathbf{NP}$ -complete in the strong sense [11]: given a list $a_{1},\ldots,a_{m}$ of positive integers and a value $k>0$ such that the value $B=\frac{\sum_{i=1}^{m}a_{i}}{k}$ is an integer, does there exist a partition $\mathcal{B}=\{B_{1},\ldots,B_{k}\}$ of the set $\{1,\ldots,m\}$ satisfying $\sum_{i\in B_{j}}a_{i}=B$ for each $1\leq j\leq k$ ?

(Here “in the strong sense” means that the problem remains $\mathbf{NP}$ -complete even if the numbers are encoded in unary.)

Proposition 3

Assume $t$ is a flat tree having $k$ basket children, each having the size $H$ , and $m$ apple children of weights $a_{1},\ldots,a_{m}$ respectively, satisfying $H\cdot k=\sum_{1\leq i\leq m}a_{i}$ .

Then $t$ is a Union-Find tree if and only if the instance $(a_{1},\ldots,a_{m},k)$ is a positive instance of the Partition problem.

Proof

(For an example, the reader is referred to Figure 3.)

Suppose $\mathcal{I}=(a_{1},\ldots,a_{m},k)$ is a positive instance of the Partition problem. Let $H$ stand for the target sum $\frac{\sum a_{i}}{k}$ . Let $\mathcal{B}=\{\mathcal{B}_{1},\ldots,\mathcal{B}_{k}\}$ be a solution of $\mathcal{I}$ , i.e., $\sum_{i\in\mathcal{B}_{j}}a_{i}=H$ for each $j=1,\ldots,k$ . Let $x_{1},\ldots,x_{k}\in{\textsc{children}}(t)$ be the nodes corresponding to the baskets of $t$ and let $y_{1},\ldots,y_{m}\in{\textsc{children}}(t)$ be the nodes corresponding to the apples of $t$ .

We define the following sequence $t_{0},t_{1},\ldots,t_{m}$ of trees: $t_{0}=t$ and for each $i=1,\ldots,m$ , let $t_{i}={\textsc{push}}(t_{i-1},y_{i},x_{j})$ with $1\leq j\leq k$ being the unique index with $i\in\mathcal{B}_{j}$ . Then, ${\textsc{children}}(t_{m})$ consists of $x_{1},\ldots,x_{k}$ and the three additional nodes having rank [math], $1$ and $2$ . Note that the subtrees rooted at the latter three nodes are Union trees. Thus, if each of the trees $t_{m}|_{x_{j}}$ is a Union-Find tree, then so is $t$ .

Consider a subtree $t^{\prime}=t_{m}|_{x_{j}}$ . By construction, $t^{\prime}$ is a tree whose root has rank $3$ and has

•

$H+1$ children of rank [math],

•

a single child of rank $1$ , having a child of rank [math],

•

and several (say, $\ell$ ) apple children with total weight $H$ .

We give a method to transform $t^{\prime}$ into a Union tree. First, we push $a_{i}$ rank-[math] nodes to each apple child of weight $a_{i}$ . After this stage $t^{\prime}$ has one child of rank [math], one child of rank $1$ and $\ell$ “filled” apple children, having a root of rank $2$ , thus the root of the transformed $t^{\prime}$ satisfies the Union condition. We only have to show that each of these “filled” apples is a Union-Find tree.

Such a subtree has a root node of rank $2$ , $a_{i}$ depth-one nodes of rank $1$ and $a_{i}+1$ depth-one nodes of rank [math]. Then, one can push into each node of rank $1$ a node of rank [math] and arrive to a tree with one depth-one node of rank [math], and $a_{i}$ depth-one nodes of rank $1$ , each having a single child of rank [math], which is indeed a Union tree, showing the claim by Theorem 3.1.

For an illustration of the construction the reader is referred to Figure 4.

For the other direction, suppose $t$ is a Union-Find tree. By Theorem 3.1 and Proposition 1, there is a subset $X\subseteq{\textsc{children}}(t)$ and a mapping $f:Y\to X$ with $Y=\{y_{1},\ldots,y_{\ell}\}={\textsc{children}}(t)-X$ such that for the sequence $t_{0}=t$ , $t_{i}={\textsc{push}}(t_{i-1},y_{i},f(y_{i}))$ we have that each immediate subtree of $t_{\ell}$ is a Union-Find tree and moreover, the root of $t_{\ell}$ satisfies the Union condition.

The root of $t$ has rank $4$ , $t_{\ell}$ has to have at least one child having rank [math], $1$ , $2$ and $3$ respectively. Since $t$ has exactly one child with rank [math] and rank $1$ , these nodes has to be in $X$ . This implies that no node gets pushed into the apples at this stage (because the apples have rank $2$ ). Thus, since the apples are not Union-Find trees (as they have strictly less rank-[math] nodes than positive-rank nodes, cf. Proposition 2), all the apples have to be in $Y$ . Apart from the apples, $t$ has exactly one depth-one node of rank $2$ (which happens to be a root of a Union tree), thus this node has to stay in $X$ as well. Moreover, we cannot push the baskets as they have the maximal rank $3$ , hence they cannot be pushed.

Thus, we have to push all the apples, and we can push apples only into baskets (as exactly the baskets have rank greater than $2$ ). Let $x\in X$ be a basket node, let $t^{\prime}$ stand for $t_{\ell}|_{x}$ and let $\{y_{1}^{\prime},\ldots,y_{j}^{\prime}\}\subseteq Y$ be the set of those apples that get pushed into $x$ during the operation. Then, the total number of nodes having rank [math] in $t^{\prime}$ is $H+2+j$ ( $j$ of them coming from the apples and the other ones coming from the basket) while the total number of nodes having a positive rank is $2+j+A$ where $A$ is the total weight of the apples in $\{y_{1}^{\prime},\ldots,y_{j}^{\prime}\}$ . Applying Proposition 2 we get that $A\leq H$ for each basket. Since the total weight of all apples is $H\cdot k$ and each apple gets pushed into exactly one basket, we get that $A=H$ actually holds for each basket. Thus, $\mathcal{I}$ is a positive instance of the Partition problem. ∎

Theorem 4.1

The recognition problem of Union-Find trees is $\mathbf{NP}$ -complete.

Proof

By Proposition 3 we get $\mathbf{NP}$ -hardness. For membership in $\mathbf{NP}$ , we make use of the characterization given in Theorem 3.1 and that the possible number of pushes is bounded above by $n^{2}$ : upon pushing $x$ below $y$ , the depth of $x$ and its descendants increases, while the depth of the other nodes remains the same. Since the depth of any node is at most $n$ , the sum of the depths of all the nodes is at most $n^{2}$ in any tree. Hence, it suffices to guess nondeterministically a sequence $t=t_{0}\vdash t_{1}\vdash\ldots\vdash t_{k}$ for some $k\leq n^{2}$ with $t_{k}$ being a Union tree (which also can be checked in polynomial time). ∎

5 Conclusion, future directions

We have shown that unless $\mathbf{P}=\mathbf{NP}$ , there is no efficient algorithm to check whether a given tree is a valid Union-Find tree, assuming union-by-rank strategy, since the problem is $\mathbf{NP}$ -complete, complementing our earlier results assuming union-by-size strategy. A very natural question is the following: does there exist a merging strategy under which the time complexity remains amortized almost-constant, and at the same time allows an efficient recognition algorithm? Although this data structure is called “primitive” in the sense that it does not really need an automatic run-time certifying system, but we find the question to be also interesting from the mathematical point of view as well. It would be also an interesting question whether the recognition problem of Union-Find trees built up according to the union-by-rank strategy is still $\mathbf{NP}$ -complete if the nodes of the tree are not tagged with the rank, that is, given a tree without rank info, does there exist a Union-Find tree with the same underlying tree?

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Hideo Bannai, Shunsuke Inenaga, Ayumi Shinohara, and Masayuki Takeda. Inferring strings from graphs and arrays. In Branislav Rovan and Peter Vojtáš, editors, Mathematical Foundations of Computer Science 2003: 28th International Symposium, MFCS 2003 , pages 208–217. Springer Berlin Heidelberg, Berlin, Heidelberg, 2003.
2[2] Leizhen Cai. The recognition of Union trees. Inf. Process. Lett. , 45(6):279–283, 1993.
3[3] Julien Clément, Maxime Crochemore, and Giuseppina Rindone. Reverse engineering prefix tables. In Susanne Albers and Jean-Yves Marion, editors, 26th International Symposium on Theoretical Aspects of Computer Science, STACS 2009 , volume 3 of LIP Ics , pages 289–300. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany, 2009.
4[4] Thomas H. Cormen, Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson. Introduction to Algorithms . Mc Graw-Hill Higher Education, 2nd edition, 2001.
5[5] Maxime Crochemore, Costas S. Iliopoulos, Solon P. Pissis, and German Tischler. Cover array string reconstruction. In Amihood Amir and Laxmi Parida, editors, Combinatorial Pattern Matching: 21st Annual Symposium, CPM 2010 , pages 251–259, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.
6[6] J.-P. Duval, T. Lecroq, and A. Lefebvre. Efficient validation and construction of border arrays and validation of string matching automata. RAIRO - Theoretical Informatics and Applications , 43(2):281–297, 2009.
7[7] J.-P. Duval and A. Lefebvre. Words over an ordered alphabet and suffix permutations. Theoretical Informatics and Applications , 36(3):249–259, 2002.
8[8] Jean-Pierre Duval, Thierry Lecroq, and Arnaud Lefebvre. Border array on bounded alphabet. J. Autom. Lang. Comb. , 10(1):51–60, 2005.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Recognizing Union-Find trees built up using union-by-rank strategy is NP\mathbf{NP}NP-complete††thanks: Research was supported by the NKFI grant no. 108448.

Abstract

1 Introduction

2 Notation

Theorem 2.1

3 Structural characterization of Union-Find trees

Proposition 1

Proof

Theorem 3.1

Proof

4 Complexity

Proposition 2

Proof

Proposition 3

Proof

Theorem 4.1

Proof

5 Conclusion, future directions

Recognizing Union-Find trees built up using union-by-rank strategy is $\mathbf{NP}$ -complete††thanks: Research was supported by the NKFI grant no. 108448.