Displaying trees across two phylogenetic networks

Janosch D\"ocker; Simone Linz; and Charles Semple

arXiv:1901.06612·math.CO·April 13, 2021·Theor. Comput. Sci.

Displaying trees across two phylogenetic networks

Janosch D\"ocker, Simone Linz, and Charles Semple

PDF

TL;DR

This paper investigates the computational complexity of comparing the display sets of two phylogenetic networks, revealing NP-completeness and $ ext{P}^{ ext{NP}}_{||}$-completeness results for key problems in phylogenetics.

Contribution

It establishes the hardness of determining common trees and equality of display sets for two phylogenetic networks, including the first proof of $ ext{P}^{ ext{NP}}_{||}$-completeness for these problems.

Findings

01

Deciding if two networks share a common displayed tree is NP-complete.

02

Checking if two networks have identical display sets is $ ext{P}^{ ext{NP}}_{||}$-complete in general.

03

Some special cases allow polynomial-time solutions, but the general problems are computationally hard.

Abstract

Phylogenetic networks are a generalization of phylogenetic trees to leaf-labeled directed acyclic graphs that represent ancestral relationships between species whose past includes non-tree-like events such as hybridization and horizontal gene transfer. Indeed, each phylogenetic network embeds a collection of phylogenetic trees. Referring to the collection of trees that a given phylogenetic network $N$ embeds as the display set of $N$ , several questions in the context of the display set of $N$ have recently been analyzed. For example, the widely studied Tree-Containment problem asks if a given phylogenetic tree is contained in the display set of a given network. The focus of this paper are two questions that naturally arise in comparing the display sets of two phylogenetic networks. First, we analyze the problem of deciding if the display sets of two phylogenetic networks have a tree in…

Figures8

Click any figure to enlarge with its caption.

Equations56

Σ_{0}^{P} = Π_{0}^{P} = P,

Σ_{0}^{P} = Π_{0}^{P} = P,

Σ_{k + 1}^{P} = NP^{Σ_{k}^{P}} and Π_{k + 1}^{P} = co-NP^{Σ_{k}^{P}} .

Σ_{k + 1}^{P} = NP^{Σ_{k}^{P}} and Π_{k + 1}^{P} = co-NP^{Σ_{k}^{P}} .

X

X

V_{r} = {r_{j}^{ℓ} : j \in {1, 2, \dots, m} and ℓ \in {1, 2, 3}} .

V_{r} = {r_{j}^{ℓ} : j \in {1, 2, \dots, m} and ℓ \in {1, 2, 3}} .

V_{v} = i = 1 ⋃ n V_{i} .

V_{v} = i = 1 ⋃ n V_{i} .

t (r_{j}^{1}) < t (r_{j}^{2}) < t (r_{j}^{3})

t (r_{j}^{1}) < t (r_{j}^{2}) < t (r_{j}^{3})

Y = {x_{j}^{ℓ} : j \in {1, 2, \dots, m} and ℓ \in {1, 2, 3}} .

Y = {x_{j}^{ℓ} : j \in {1, 2, \dots, m} and ℓ \in {1, 2, 3}} .

Ψ = \forall v_{1} \forall v_{2} \dots \forall v_{p} \exists v_{p + 1} \exists v_{p + 2} \dots \exists v_{n} j = 1 ⋀ m C_{j}

Ψ = \forall v_{1} \forall v_{2} \dots \forall v_{p} \exists v_{p + 1} \exists v_{p + 2} \dots \exists v_{n} j = 1 ⋀ m C_{j}

P^{\forall}

P^{\forall}

P^{\exists}

P^{\forall}

P^{\forall}

P^{\exists}

Ψ = \forall v_{1} \forall v_{2} \dots \forall v_{p} \exists v_{p + 1} \exists v_{p + 2} \dots \exists v_{n} j = 1 ⋀ m C_{j}

Ψ = \forall v_{1} \forall v_{2} \dots \forall v_{p} \exists v_{p + 1} \exists v_{p + 2} \dots \exists v_{n} j = 1 ⋀ m C_{j}

π_{i}^{+} = (s_{i}^{v}, p_{l_{1}}^{in}, p_{l_{1}}^{out}, p_{l_{2}}^{in}, p_{l_{2}}^{out}, \dots, p_{l_{q}}^{in}, p_{l_{q}}^{out}, y_{i}, t_{i}^{v})

π_{i}^{+} = (s_{i}^{v}, p_{l_{1}}^{in}, p_{l_{1}}^{out}, p_{l_{2}}^{in}, p_{l_{2}}^{out}, \dots, p_{l_{q}}^{in}, p_{l_{q}}^{out}, y_{i}, t_{i}^{v})

π_{i}^{-} = (s_{i}^{v}, n_{k_{1}}^{in}, n_{k_{1}}^{out}, n_{k_{2}}^{in}, n_{k_{2}}^{out}, \dots, n_{k_{r}}^{in}, n_{k_{r}}^{out}, y_{i}, t_{i}^{v})

π_{i}^{-} = (s_{i}^{v}, n_{k_{1}}^{in}, n_{k_{1}}^{out}, n_{k_{2}}^{in}, n_{k_{2}}^{out}, \dots, n_{k_{r}}^{in}, n_{k_{r}}^{out}, y_{i}, t_{i}^{v})

P^{\forall}

P^{\forall}

P^{\exists}

Π^{\forall} \cup {π_{p + 1}^{v}, π_{p + 2}^{v}, \dots, π_{n}^{v}, π_{1}^{c}, π_{2}^{c}, \dots, π_{m}^{c}}

Π^{\forall} \cup {π_{p + 1}^{v}, π_{p + 2}^{v}, \dots, π_{n}^{v}, π_{1}^{c}, π_{2}^{c}, \dots, π_{m}^{c}}

Π = {π_{1}^{v}, π_{2}^{v}, \dots, π_{p}^{v}} \cup {π_{p + 1}^{v}, π_{p + 2}^{v}, \dots, π_{n}^{v}, π_{1}^{c}, π_{2}^{c}, \dots, π_{m}^{c}}

Π = {π_{1}^{v}, π_{2}^{v}, \dots, π_{p}^{v}} \cup {π_{p + 1}^{v}, π_{p + 2}^{v}, \dots, π_{n}^{v}, π_{1}^{c}, π_{2}^{c}, \dots, π_{m}^{c}}

P^{\forall}

P^{\forall}

P^{\exists}

{(s_{i}, u_{i}^{1}), (s_{i}, u_{i}^{2}), (u_{i}^{1}, u_{i}^{3}), (u_{i}^{2}, u_{i}^{3}), (u_{i}^{3}, t_{i}), (u_{i}^{1}, t_{i}^{'}), (u_{i}^{2}, t_{i}^{''})}

{(s_{i}, u_{i}^{1}), (s_{i}, u_{i}^{2}), (u_{i}^{1}, u_{i}^{3}), (u_{i}^{2}, u_{i}^{3}), (u_{i}^{3}, t_{i}), (u_{i}^{1}, t_{i}^{'}), (u_{i}^{2}, t_{i}^{''})}

X^{'} = {t_{0}, t_{1}, t_{2}, \dots, t_{k}} \cup {t_{i}^{'}, t_{i}^{''} : i \in {1, 2, \dots, p}} .

X^{'} = {t_{0}, t_{1}, t_{2}, \dots, t_{k}} \cup {t_{i}^{'}, t_{i}^{''} : i \in {1, 2, \dots, p}} .

E_{T}^{'}

E_{T}^{'}

E_{T}^{'} \cup E_{1}^{'} \cup E_{2}^{'} \cup \dots \cup E_{p}^{'} \cup {(u, t_{0})}

E_{T}^{'} \cup E_{1}^{'} \cup E_{2}^{'} \cup \dots \cup E_{p}^{'} \cup {(u, t_{0})}

(E_{i} - {(s_{i}, w_{i}^{'})}) \cup {(s_{i}, v_{i}^{'}), (v_{i}^{'}, w_{i}^{'}), (v_{i}^{'}, t_{i}^{'}), (s_{i}, v_{i}^{''}), (v_{i}^{''}, t_{i}^{''})}

(E_{i} - {(s_{i}, w_{i}^{'})}) \cup {(s_{i}, v_{i}^{'}), (v_{i}^{'}, w_{i}^{'}), (v_{i}^{'}, t_{i}^{'}), (s_{i}, v_{i}^{''}), (v_{i}^{''}, t_{i}^{''})}

(E_{i} - {(s_{i}, w_{i}^{''})}) \cup {(s_{i}, v_{i}^{''}), (v_{i}^{''}, w_{i}^{''}), (v_{i}^{''}, t_{i}^{''}), (s_{i}, v_{i}^{'}), (v_{i}^{'}, t_{i}^{'})}

(E_{i} - {(s_{i}, w_{i}^{''})}) \cup {(s_{i}, v_{i}^{''}), (v_{i}^{''}, w_{i}^{''}), (v_{i}^{''}, t_{i}^{''}), (s_{i}, v_{i}^{'}), (v_{i}^{'}, t_{i}^{'})}

i = 1 ⋃ k X_{i} = X .

i = 1 ⋃ k X_{i} = X .

E_{U} = {(u_{2}, w_{1}), (u_{2}, w_{2}), (u_{3}, w_{3}), \dots, (u_{2 n + 3}, w_{2 n + 3})}

E_{U} = {(u_{2}, w_{1}), (u_{2}, w_{2}), (u_{3}, w_{3}), \dots, (u_{2 n + 3}, w_{2 n + 3})}

{(u_{n + 3}, u_{n + 2}), (u_{n + 2}, u_{n + 1}), (u_{n + 1}, w_{n + 1}), (u_{n + 2}, w_{n + 2}), (u_{n + 3}, w_{n + 3})}

{(u_{n + 3}, u_{n + 2}), (u_{n + 2}, u_{n + 1}), (u_{n + 1}, w_{n + 1}), (u_{n + 2}, w_{n + 2}), (u_{n + 3}, w_{n + 3})}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Displaying trees across two phylogenetic networks

Janosch Döcker, Simone Linz, and Charles Semple

Department of Computer Science, University of Tübingen, Tübingen, Germany

[email protected]

School of Computer Science, University of Auckland, Auckland, New Zealand

[email protected]

School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand

[email protected]

Abstract.

Phylogenetic networks are a generalization of phylogenetic trees to leaf-labeled directed acyclic graphs that represent ancestral relationships between species whose past includes non-tree-like events such as hybridization and horizontal gene transfer. Indeed, each phylogenetic network embeds a collection of phylogenetic trees. Referring to the collection of trees that a given phylogenetic network ${\mathcal{N}}$ embeds as the display set of ${\mathcal{N}}$ , several questions in the context of the display set of ${\mathcal{N}}$ have recently been analyzed. For example, the widely studied Tree-Containment problem asks if a given phylogenetic tree is contained in the display set of a given network. The focus of this paper are two questions that naturally arise in comparing the display sets of two phylogenetic networks. First, we analyze the problem of deciding if the display sets of two phylogenetic networks have a tree in common. Surprisingly, this problem turns out to be NP-complete even for two temporal normal networks. Second, we investigate the question of whether or not the display sets of two phylogenetic networks are equal. While we recently showed that this problem is polynomial-time solvable for a normal and a tree-child network, it is computationally hard in the general case. In establishing hardness, we show that the problem is contained in the second level of the polynomial-time hierarchy. Specifically, it is $\Pi_{2}^{P}$ -complete. Along the way, we show that two other problems are also $\Pi_{2}^{P}$ -complete, one of which being a generalization of Tree-Containment.

Key words and phrases:

display set, normal networks, phylogenetic networks, polynomial-time hierarchy, temporal networks, tree containment

We thank Britta Dorn for insightful discussions. The second and third author thank the New Zealand Marsden Fund for their financial support.

1. Introduction

In trying to disentangle the evolutionary history of species, phylogenetic networks, which are leaf-labeled directed acyclic graphs, are becoming increasingly important. From a biological as well as from a mathematical viewpoint, phylogenetic networks are often regarded as a tool to summarize a collection of conflicting phylogenetic trees. Due to processes such as hybridization and lateral gene transfer, the evolution at the species-level is not necessarily tree-like. Nevertheless, individual genes or parts thereof are usually assumed to evolve in a tree-like way. It is consequently of interest to construct phylogenetic networks that embed a collection of phylogenetic trees or, reversely, summarize the phylogenetic trees that are embedded in a given phylogenetic network. These and related types of problems have recently attracted considerable attention from the mathematical community as they lead to a number of challenging questions. One of the most studied questions in this context is called Tree-Containment. Given a phylogenetic network ${\mathcal{N}}$ and a phylogenetic tree ${\mathcal{T}}$ , this problem asks whether or not ${\mathcal{N}}$ embeds ${\mathcal{T}}$ . While Tree-Containment is NP-complete in general [7], it has been shown to be polynomial-time solvable for several popular classes of phylogenetic networks, e.g. so-called tree-child and reticulation-visible networks [1, 6, 15]. Currently, the fastest algorithm that solves Tree-Containment for these latter types of networks has a running time that is linear in the size of ${\mathcal{N}}$ and, hence, linear in the number of leaves of ${\mathcal{N}}$ [16].

Pushing Tree-Containment into a novel direction, Gunawan et al. [6] have recently posed the question of how one can check if two reticulation-visible networks embed the same set of phylogenetic trees. Since the number of trees that a phylogenetic network ${\mathcal{N}}$ embeds grows exponentially with the number $k$ of vertices in ${\mathcal{N}}$ whose in-degree is at least two, there is no immediate check that can be performed in polynomial time. In particular, the number of phylogenetic trees that ${\mathcal{N}}$ embeds is bounded above by $2^{k}$ , and it was shown independently in [15, Theorem 1] and [18, Corollary 3.4] that this upper bound is sharp for the class of normal networks.

Referring to the collection of phylogenetic trees that a given phylogenetic network embeds as its display set (formally defined in Section 2), we investigate two questions that naturally arise in comparing the display sets of two phylogenetic networks. The first question asks if the display sets of two phylogenetic networks have a common element. We call this problem Common-Tree-Containment and show in Section 3 that it is NP-complete even when the two input networks are both temporal and normal. Strikingly, the class of temporal and normal networks is a strict subclass of the class of tree-child and, hence, reticulation-visible networks for which Tree-Containment is polynomial-time solvable. The second problem, which we refer to as Display-Set-Equivalence, is the problem of Gunawan et al. [6] mentioned above that asks, without restricting to a particular class of phylogenetic networks, if the display sets of two networks are equal. While we recently showed that this problem has a polynomial-time algorithm for when the input consists of a normal and a tree-child network [3], we show in Section 4 that the problem is computationally hard for two arbitrary phylogenetic networks. Specifically, we show that Display-Set-Equivalence is $\Pi_{2}^{P}$ -complete or, in other words, complete for the second level of the polynomial-time hierarchy[14]. Intuitively, this problem is therefore much harder to solve than any NP-complete or co-NP-complete problem. In establishing the result, we also show that deciding if the display set of one phylogenetic network is contained in the display set of another network is $\Pi_{2}^{P}$ -complete.

The paper is organized as follows. The next section contains preliminaries that are used throughout the paper, formal statements of the decision problems that are mentioned in the previous paragraph, and some relevant details about the polynomial-time hierarchy. Section 3 establishes NP-completeness of Common-Tree-Containment and Section 4 establishes $\Pi_{2}^{P}$ -completeness of Display-Set-Equivalence. Lastly, Section 5 contains some concluding remarks and highlights three corollaries that follow from the results in Sections 3.

2. Preliminaries

This section provides notation and terminology that is used in the remaining sections. Throughout this paper, $X$ denotes a non-empty finite set. Let $G$ be a directed acyclic graph. For two distinct vertices $u$ and $v$ in $G$ , we say that $u$ is an ancestor of $v$ and $v$ is a descendant of $u$ , if there is a directed path from $u$ to $v$ in $G$ . If $(u,v)$ is an edge in $G$ , then $u$ is a parent of $v$ and $v$ is a child of $u$ . Moreover, a vertex of $G$ with in-degree one and out-degree zero is a leaf of $G$ .

Phylogenetic networks and trees. A rooted binary phylogenetic network ${\mathcal{N}}$ on $X$ is a (simple) rooted acyclic digraph that satisfies the following properties:

(i)

the (unique) root has out-degree two, 2. (ii)

the set $X$ is the set of vertices of out-degree zero, each of which has in-degree one, and 3. (iii)

all other vertices have either in-degree one and out-degree two, or in-degree two and out-degree one.

The set $X$ is the leaf set of ${\mathcal{N}}$ . Furthermore, the vertices of in-degree one and out-degree two are tree vertices, while the vertices of in-degree two and out-degree one are reticulations. An edge directed into a reticulation is called a reticulation edge while each non-reticulation edge is called a tree edge.

Let ${\mathcal{N}}$ be a rooted binary phylogenetic network on $X$ . If ${\mathcal{N}}$ has no reticulations, then ${\mathcal{N}}$ is said to be a rooted binary phylogenetic $X$ -tree. To ease reading and since all phylogenetic networks considered in this paper are rooted and binary, we refer to a rooted binary phylogenetic network (resp. a rooted binary phylogenetic tree) simply as a phylogenetic network (resp. a phylogenetic tree).

Now let ${\mathcal{T}}$ be a phylogenetic $X$ -tree. If $Y=\{y_{1},y_{2},\ldots,y_{m}\}$ is a subset of $X$ , then ${\mathcal{T}}[-y_{1},y_{2},\ldots,y_{m}]$ and, equivalently, ${\mathcal{T}}|(X-Y)$ denote the phylogenetic tree with leaf set $X-Y$ that is obtained from the minimal rooted subtree of ${\mathcal{T}}$ that connects all leaves in $X-Y$ by suppressing all vertices of in-degree one and out-degree one.

Remark. Throughout the paper, we frequently detail constructions of phylogenetic networks. To this end, we sometimes need labels of internal vertices. Their only purpose is to make references. Indeed, they should not be regarded as genuine labels as those used for the leaves of a phylogenetic network.

Classes of phylogenetic networks. Let ${\mathcal{N}}$ be a phylogenetic network on $X$ with vertex set $V$ . An edge $e=(u,v)$ is a shortcut if there is a directed path from $u$ to $v$ whose set of edges does not contain $e$ . A vertex $v$ of ${\mathcal{N}}$ is called visible if there exists a leaf $\ell\in X$ such that each directed path from the root of ${\mathcal{N}}$ to $\ell$ passes through $v$ . Now ${\mathcal{N}}$ is reticulation-visible if each reticulation in ${\mathcal{N}}$ is visible, and ${\mathcal{N}}$ is tree-child if each non-leaf vertex in ${\mathcal{N}}$ has a child that is a leaf or a tree vertex. Lastly, ${\mathcal{N}}$ is normal if it is tree-child and does not contain any shortcuts. Clearly, by definition, each normal network is also tree-child. Furthermore, it follows from the next well-known equivalence result [2] that each tree-child network is also reticulation-visible.

Lemma 2.1.

Let ${\mathcal{N}}$ be a phylogenetic network. Then ${\mathcal{N}}$ is tree-child if and only if each vertex of ${\mathcal{N}}$ is visible.

Thus, the class of normal networks is a subclass of tree-child networks. Furthermore, if there exists a map $t:V\rightarrow{\mathbb{R}}^{+}$ that assigns a time stamp to each vertex of ${\mathcal{N}}$ and satisfies the following two properties:

(i)

$t(u)=t(v)$ whenever $(u,v)$ is a reticulation edge and 2. (ii)

$t(u)<t(v)$ whenever $(u,v)$ is a tree edge,

then we say that ${\mathcal{N}}$ is temporal, in which case we call $t$ a temporal labeling of ${\mathcal{N}}$ . Note that, although normal networks have no shortcuts, a normal network need not be temporal. Tree-child, normal, and temporal networks were first introduced by Cardona et al. [2], Willson [17], and Moret et al. [11], respectively.

Caterpillars. Let ${\mathcal{C}}$ be a phylogenetic tree with leaf set $\{\ell_{1},\ell_{2},\ldots,\ell_{n}\}$ . Furthermore, for each $i\in\{1,2,\ldots,n\}$ let $p_{i}$ denote the parent of $\ell_{i}$ . Then ${\mathcal{C}}$ is called a caterpillar if $n\geq 2$ and the elements in the leaf set of ${\mathcal{C}}$ can be ordered, say $\ell_{1},\ell_{2},\ldots,\ell_{n}$ , so that $p_{1}=p_{2}$ and, for all $i\in\{3,4,\ldots,n\}$ , we have $(p_{i},p_{i-1})$ as an edge in ${\mathcal{C}}$ . In this case, we denote ${\mathcal{C}}$ by $(\ell_{1},\ell_{2},\ldots,\ell_{n})$ . Additionally, we say that a phylogenetic $X$ -tree ${\mathcal{T}}$ contains a caterpillar ${\mathcal{C}}=(\ell_{1},\ell_{2},\ldots,\ell_{n})$ if ${\mathcal{T}}$ has a subtree that is a subdivision of ${\mathcal{C}}$ .

Displaying. Let ${\mathcal{N}}$ be a phylogenetic network on $X$ and let ${\mathcal{T}}$ be a phylogenetic $Y$ -tree such that $Y\subseteq X$ . Then ${\mathcal{N}}$ displays ${\mathcal{T}}$ if, up to suppressing vertices of in-degree one and out-degree one, ${\mathcal{T}}$ can be obtained from ${\mathcal{N}}$ by deleting edges and vertices, in which case, the edge set, denoted by $E_{\mathcal{T}}$ , of the resulting acyclic directed graph is called an embedding of ${\mathcal{T}}$ in ${\mathcal{N}}$ . If ${\mathcal{N}}$ displays ${\mathcal{T}}$ , note that the root of an embedding of ${\mathcal{T}}$ in ${\mathcal{N}}$ does not necessarily coincide with the root of ${\mathcal{N}}$ . In fact, throughout this paper, we impose that the root of an embedding has in-degree zero and out-degree two. Moreover, the display set of ${\mathcal{N}}$ , denoted by $T({\mathcal{N}})$ , consists of all phylogenetic $X$ -trees that are displayed by ${\mathcal{N}}$ . As mentioned in the introduction, the size of $T({\mathcal{N}})$ is bounded above by $2^{k}$ , where $k$ is the number of reticulations in ${\mathcal{N}}$ . To illustrate, Figure 1 shows a phylogenetic network ${\mathcal{N}}$ with $T({\mathcal{N}})=\{{\mathcal{T}}_{1},{\mathcal{T}}_{2},\ldots,{\mathcal{T}}_{5}\}$ , where the five trees in $T({\mathcal{N}})$ are shown on the right-hand side of the same figure. In this as well as in all other figures throughout the paper, edges are directed downwards.

Again, let ${\mathcal{N}}$ be a phylogenetic network on $X$ , and let $S$ be a subset of the edges of ${\mathcal{N}}$ . Then $S$ is a switching of ${\mathcal{N}}$ if, for each reticulation $v$ of ${\mathcal{N}}$ , $S$ contains precisely one of the two reticulation edges that are directed into $v$ . Now, let $S$ be a switching of ${\mathcal{N}}$ . If we delete each reticulation edge in ${\mathcal{N}}$ that is not in $S$ and, repeatedly, suppress each resulting vertex with in-degree one and out-degree one, delete each vertex with in-degree one and out-degree zero that is not in $X$ , and delete each vertex with in-degree zero and out-degree one, we obtain a phylogenetic $X$ -tree ${\mathcal{T}}$ , in which case, we say that $S$ yields ${\mathcal{T}}$ . Note that ${\mathcal{T}}$ is displayed by ${\mathcal{N}}$ . Conversely, observe that, if ${\mathcal{T}}$ is a phylogenetic $X$ -tree that is displayed by ${\mathcal{N}}$ , then there exists a switching of ${\mathcal{N}}$ that yields ${\mathcal{T}}$ . We summarize this in the following observation.

Observation 2.2.

A phylogenetic network ${\mathcal{N}}$ on $X$ displays a phylogenetic $X$ -tree ${\mathcal{T}}$ if and only if there exists a switching of ${\mathcal{N}}$ that yields ${\mathcal{T}}$ .

Problem statements. Tree-Containment is a well known problem in the study of phylogenetic networks and its computational complexity has extensively been analyzed for various network classes. In the language of this paper, it can be stated as follows.

Tree-Containment

Input. A phylogenetic $X$ -tree ${\mathcal{T}}$ and phylogenetic network ${\mathcal{N}}$ on $X$ .

Question. Is ${\mathcal{T}}\in T({\mathcal{N}})$ ?

While Tree-Containment is concerned with a single display set, it is natural to compare display sets across phylogenetic networks, e.g. in the context of comparing networks. To make a first step in this direction, the focus of this paper are the following three decision problems that compare the display sets of two phylogenetic networks.

Common-Tree-Containment

Input. Two phylogenetic networks ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ on $X$ .

Question. Is $T({\mathcal{N}})\cap T({\mathcal{N}}^{\prime})\neq\emptyset$ ?

Display-Set-Containment

Input. Two phylogenetic networks ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ on $X$ .

Question. Is $T({\mathcal{N}})\subseteq T({\mathcal{N}}^{\prime})$ ?

Display-Set-Equivalence

Input. Two phylogenetic networks ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ on $X$ .

Question. Is $T({\mathcal{N}})=T({\mathcal{N}}^{\prime})$ ?

We note that Tree-Containment is a special case of both Display-Set-Containment and Common-Tree-Containment. Hence, NP-hardness of the two latter problems follows immediately for when ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ are two arbitrary phylogenetic networks. Nevertheless, as we will see in Sections 3 and 4, we pinpoint the complexity of Common-Tree-Containment and Display-Set-Containment exactly. In particular, we will show that (i) Common-Tree-Containment is NP-complete even for when ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ are both temporal and normal and (ii) Display-Set-Containment is complete for the second level of the polynomial-time hierarchy. This last result turns out to be a key ingredient in showing that Display-Set-Equivalence is also complete for the second level of the polynomial-time hierarchy.

The polynomial hierarchy. The polynomial-time hierarchy (or short, polynomial hierarchy) [5, 14] consists of a system of complexity classes that are defined recursively and generalize the classes P, NP, and co-NP. In particular, for any integer $k\geq 0$ , referred to as level, we have

[TABLE]

Level-0 of the hierarchy coincides with the class P (i.e. $\Sigma_{0}^{P}=\Pi_{0}^{P}$ ) while level-1 coincides with the class NP (i.e. $\Sigma_{1}^{P}$ ) and co-NP (i.e. $\Pi_{1}^{P}$ ), respectively. For all $k\geq 0$ , it is an open problem whether or not $\Sigma_{k}^{P}\neq\Sigma_{k+1}^{P}$ . Specifically, for $k=0$ , this is the fundamental P versus NP problem. If $\Sigma_{k}^{P}=\Sigma_{k+1}^{P}$ or $\Pi_{k}^{P}=\Pi_{k+1}^{P}$ for some $k\geq 0$ , then this would result in a collapse of the polynomial hierarchy to the $k$ -th level.

In Section 4, we show that Display-Set-Containment and Display-Set-Equivalence are both $\Pi_{2}^{P}$ -complete. Intuitively, problems that are complete for the second level of the polynomial hierarchy are more difficult than problems that are complete for the first level. Recall that a decision problem is in co-NP if a no-instance can be verified in polynomial time given an appropriate certificate. Now, similar to showing that a problem is co-NP-complete, a proof that establishes $\Pi_{2}^{P}$ -completeness consists of two steps: (i) show that a problem is in $\Pi_{2}^{P}$ , and (ii) establish a polynomial-time reduction from a problem that is known to be $\Pi_{2}^{P}$ -complete to the problem at hand. With regards to (i), a decision problem is in $\Pi_{2}^{P}$ if a no-instance can be verified in polynomial time when one is given an appropriate certificate and has access to an NP-oracle, that is, an oracle that can solve NP-complete problems in constant time.

3. Hardness of Common-Tree-Containment

As noted in the introduction, Tree-Containment is NP-complete in general, but polynomial-time solvable for several popular classes of phylogenetic networks such as tree-child and reticulation-visible networks. In this section, we show that no such dichotomy holds for Common-Tree-Containment. In particular, we will show that this problem is NP-complete even if the input consists of two temporal normal networks. To establish the result, we use a reduction from the classical computational problem 3-SAT.

3-SAT

Input. A set $V=\{v_{1},v_{2},\ldots,v_{n}\}$ of variables, and a set $\{C_{1},C_{2},\ldots,C_{m}\}$ of clauses such that each clause is a disjunction of exactly three literals and each literal is an element in $\{v_{i},\bar{v}_{i}:i\in\{1,2,\ldots,n\}\}$ .

Question. Does there exist a truth assignment for $V$ that satisfies each clause $C_{j}$ with $j\in\{1,2,\ldots,m\}$ ?

Let $I$ be an instance of 3-SAT, and let $C_{j}=(x_{j}^{1}\vee x_{j}^{2}\vee x_{j}^{3})$ be a clause of $I$ for $j\in\{1,2,\ldots,m\}$ . Then, for some indices $k$ , $k^{\prime}$ , and $k^{\prime\prime}$ in $\{1,2,\ldots,n\}$ , we have $x_{j}^{1}\in\{v_{k},\bar{v}_{k}\}$ , $x_{j}^{2}\in\{v_{k^{\prime}},\bar{v}_{k^{\prime}}\}$ , and $x_{j}^{3}\in\{v_{k^{\prime\prime}},\bar{v}_{k^{\prime\prime}}\}$ . Without loss of generality, we impose the following two restrictions on $I$ :

(R1)

for each $v_{i}\in V$ with $i\in\{1,2,\ldots,n\}$ , at most one element in $\{v_{i},\bar{v}_{i}\}$ is a literal of $C_{j}$ and 2. (R2)

$k<k^{\prime}<k^{\prime\prime}$ .

Now, for each clause $C_{j}$ , we construct the two clause gadgets $G_{j}^{A}$ and $G_{j}^{B}$ that are shown in Figure 2. We next establish a simple lemma.

Lemma 3.1.

Let $G_{j}^{A}$ and $G_{j}^{B}$ be the two clause gadgets that are shown in Figure 2. Obtain two phylogenetic networks ${\mathcal{G}}_{j}^{A}$ and ${\mathcal{G}}_{j}^{B}$ from $G_{j}^{A}$ and $G_{j}^{B}$ , respectively, by suppressing the three vertices $r_{j}^{1}$ , $r_{j}^{2}$ , and $r_{j}^{3}$ of in-degree one and out-degree one. Then $T({\mathcal{G}}_{j}^{A})\cap T({\mathcal{G}}_{b}^{B})=\emptyset$ .

Proof.

To see that $T({\mathcal{G}}_{j}^{A})\cap T({\mathcal{G}}_{j}^{B})=\emptyset$ , observe that each tree in $T({\mathcal{G}}_{j}^{A})$ contains the caterpillar $(x_{j}^{2},x_{j}^{3},x_{j}^{1})$ , whereas each tree in $T({\mathcal{G}}_{j}^{B})$ contains the caterpillar $(x_{j}^{1},x_{j}^{3},x_{j}^{2})$ . ∎

Let $S=(s_{1},s_{2},\ldots,s_{n})$ be an arbitrary tuple, and let $r$ be an element that is not contained in $S$ . We write $(r)||S$ to denote the tuple $(r,s_{1},s_{2},\ldots,s_{n})$ obtained by concatenating $r$ and $S$ . With this definition in hand, we are now in a position to establish the main result of this section.

Theorem 3.2.

Common-Tree-Containment* is NP-complete when the input consists of two temporal normal networks.*

Proof.

For two normal networks, van Iersel et al. [15] showed that the running time of Tree-Containment is polynomial in the size of this leaf set. Hence, it follows that Common-Tree-Containment is in NP for two normal networks.

Let $I$ be an instance of 3-SAT with $n$ variables and $m$ clauses. Using the same notation as in the formal statement of 3-SAT, we construct two phylogenetic networks ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ on

[TABLE]

as follows. Let ${\mathcal{T}}$ be the phylogenetic tree obtained by creating a vertex $\rho$ , adding an edge that joins $\rho$ with the root of the caterpillar $(v_{1},v_{2},\ldots,v_{n})$ , and adding an edge that joins $\rho$ with the root of the caterpillar $(c_{1},c_{2},\ldots,c_{m})$ . Now, setting ${\mathcal{M}}={\mathcal{M}}^{\prime}={\mathcal{T}}$ , let ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ be the two phylogenetic networks obtained from ${\mathcal{M}}$ and ${\mathcal{M}}^{\prime}$ , respectively, by applying the following four-step process.

(1)

For all $j\in\{1,2,\ldots,m\}$ , replace $c_{j}$ with $G_{j}^{A}$ in ${\mathcal{M}}$ and replace $c_{j}$ with $G_{j}^{B}$ in ${\mathcal{M}}^{\prime}$ . 2. (2)

For all $i\in\{1,2,\ldots,n\}$ , subdivide the edge directed into $v_{i}$ with a new vertex $d_{i}$ in ${\mathcal{M}}$ and ${\mathcal{M}}^{\prime}$ . 3. (3)

For each $j\in\{1,2,\ldots,m\}$ in increasing order, consider $C_{j}=(x_{j}^{1}\vee x_{j}^{2}\vee x_{j}^{3})$ . Let $v_{k_{\ell}}$ be the unique element in $V$ such that $x_{j}^{\ell}\in\{v_{k_{\ell}},\bar{v}_{k_{\ell}}\}$ for each $\ell\in\{1,2,3\}$ . If $x_{j}^{\ell}=v_{k_{\ell}}$ , subdivide the edge directed into $v_{k_{\ell}}$ with a new vertex $u_{j}^{\ell}$ in ${\mathcal{M}}$ and subdivide the edge directed into $d_{k_{\ell}}$ with a new vertex $u_{j}^{\ell}$ in ${\mathcal{M}}^{\prime}$ . Otherwise, subdivide the edge directed into $d_{k_{\ell}}$ with a new vertex $u_{j}^{\ell}$ in ${\mathcal{M}}$ and subdivide the edge directed into $v_{k_{\ell}}$ with a new vertex $u_{j}^{\ell}$ in ${\mathcal{M}}^{\prime}$ . Add a new edge $(u_{j}^{\ell},r_{j}^{\ell})$ in ${\mathcal{M}}$ and ${\mathcal{M}}^{\prime}$ .

(4)

For each $i\in\{1,2,\ldots,n\}$ , suppress the vertex $d_{i}$ of in-degree one and out-degree one in ${\mathcal{M}}$ and ${\mathcal{M}}^{\prime}$ .

To illustrate, Figure 3 gives a high-level overview of the construction of ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ . Observe that, for each $j\in\{1,2,\ldots,m\}$ , the three vertices $r_{j}^{1}$ , $r_{j}^{2}$ , and $r_{j}^{3}$ in ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ are reticulations.

We next show that ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ are both temporal and normal.

3.2.1.

Both ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ are temporal and normal.

Proof.

We first show that ${\mathcal{N}}$ is temporal and normal. Let

[TABLE]

Furthermore, for each $i\in\{1,2,\ldots,n\}$ , let $V_{i}$ consist of all vertices that lie on the unique directed path from the root of ${\mathcal{N}}$ to $v_{i}$ , and let

[TABLE]

We begin by assigning a positive real-valued labeling $t$ to each vertex in $V_{v}\cup V_{r}$ as follows. First, under $t$ , each vertex in $V_{v}$ is assigned a labeling such that the following two properties are satisfied.

(i)

If $u,v\in V_{v}$ and $u$ is an ancestor of $v$ , then $t(u)<t(v)$ . 2. (ii)

For all $i\in\{1,2,\ldots,n-1\}$ , the temporal labeling of each vertex in $V_{i}$ that is not contained in $V_{i+1}$ is smaller than the minimum temporal labeling over all vertices that are contained in $V_{i+1}$ and not in $V_{i}$ .

By construction of ${\mathcal{N}}$ , note that such a labeling always exists. Second, under $t$ , each vertex in $V_{r}$ is assigned the same labeling as its unique parent that is contained in $V_{v}$ . Because of restrictions (R1) and (R2) that we have imposed on $I$ and the way we have assigned temporal labelings to the vertices in $V_{v}$ , we have

[TABLE]

for each $j\in\{1,2,\ldots,m\}$ . A routine check now shows that $t$ can be extended to a temporal labeling of ${\mathcal{N}}$ and, thus, ${\mathcal{N}}$ is temporal.

Now, since ${\mathcal{N}}$ is temporal, it follows that ${\mathcal{N}}$ has no shortcuts. Hence, to show that ${\mathcal{N}}$ is normal, it suffices to show that ${\mathcal{N}}$ is tree-child. It is straightforward to check that ${\mathcal{N}}$ has no edge $(u,v)$ such that $u$ and $v$ are both reticulations. Hence, each reticulation in ${\mathcal{N}}$ has a child that is a tree vertex or a leaf. Furthermore, by construction, each tree vertex of ${\mathcal{N}}$ that is a vertex of some $G_{j}^{A}$ with $j\in\{1,2,\ldots,m\}$ has a child that is a tree vertex or a leaf. Lastly, for each non-leaf vertex $v$ of ${\mathcal{N}}$ that is neither a reticulation nor a vertex of some $G_{j}^{A}$ , consider a directed path $P$ from $v$ to an element in $\{v_{1},v_{2},\ldots,v_{n},C_{1},C_{2},\ldots,C_{m}\}$ . By construction, $P$ exists. It is now easily seen that the second vertex of $P$ is a child of $v$ that is either a tree vertex or a leaf. This establishes that ${\mathcal{N}}$ is normal. An analogous argument that uses $G_{j}^{B}$ instead of $G_{j}^{A}$ can be used to show that ${\mathcal{N}}^{\prime}$ is temporal and normal, thereby completing the proof of (3.2.1). ∎

Since the number of vertices of a normal network is polynomial in the size of $X$ [10] and $|X|=10m+n$ , it follows that ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ can be constructed in time polynomial in the size of $X$ .

3.2.2.

The instance $I$ is a yes-instance if and only if $T({\mathcal{N}})\cap T({\mathcal{N}}^{\prime})\neq\emptyset$ .

Proof.

First, suppose that $I$ is a yes-instance. We construct a variable tree ${\mathcal{T}}_{v}$ and a clause tree ${\mathcal{T}}_{c}$ that, joined together, result in a phylogenetic $X$ -tree that is displayed by ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ . Let $\beta:V\rightarrow\{F,T\}$ be a truth assignment that satisfies each clause, and let

[TABLE]

Furthermore, for each $i\in\{1,2,\ldots,n\}$ , let $Y_{i}$ (resp. $\bar{Y}_{i}$ ) be the tuple consisting of the elements in $Y$ that equal $v_{i}$ (resp. $\bar{v}_{i}$ ) such that, for any two elements $x_{j}^{\ell}$ and $x_{j^{\prime}}^{\ell^{\prime}}$ in $Y_{i}$ (resp. $\bar{Y}_{i}$ ), $x_{j}^{\ell}$ precedes $x_{j^{\prime}}^{\ell^{\prime}}$ precisely if $j>j^{\prime}$ . By construction, note that the two caterpillars $(v_{i})||Y_{i}$ and $(v_{i})||\bar{Y}_{i}$ are displayed by ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ . Now, obtain ${\mathcal{T}}_{v}$ from the caterpillar $(v_{1},v_{2},\ldots,v_{n})$ by doing the following for each $i\in\{1,2,\ldots,n\}$ . If $\beta(v_{i})=T$ , replace $v_{i}$ with the caterpillar $(v_{i})||Y_{i}$ ; otherwise, replace $v_{i}$ with the caterpillar $(v_{i})||\bar{Y}_{i}$ . Again, by construction, it is easily checked that ${\mathcal{T}}_{v}$ is displayed by ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ . We next construct ${\mathcal{T}}_{c}$ . Consider a clause $C_{j}=(x_{j}^{1}\vee x_{j}^{2}\vee x_{j}^{3})$ . For each $\ell\in\{1,2,3\}$ , set $z_{\ell}=T$ if $x_{j}^{\ell}$ is satisfied by $\beta$ and, otherwise, set $z_{\ell}=F$ . Depending on which elements in $\{z_{1},z_{2},z_{3}\}$ equal $F$ and $T$ , respectively, and noting that there exists some $\ell$ for which $z_{\ell}=T$ , we define the clause tree ${\mathcal{T}}_{j}^{z_{1}z_{2}z_{3}}$ relative to $C_{j}$ to be one of the seven trees that are listed in Figure 4. Intuitively, $x_{j}^{\ell}$ is a leaf in ${\mathcal{T}}_{j}^{z_{1}z_{2}z_{3}}$ precisely if $z_{\ell}=F$ . Now, obtain ${\mathcal{T}}_{c}$ from the caterpillar $(c_{1},c_{2},\ldots,c_{m})$ by replacing, for each $j\in\{1,2,\ldots,m\}$ , the leaf $c_{j}$ with the clause tree relative to $C_{j}$ . As ${\mathcal{T}}_{j}^{z_{1}z_{2}z_{3}}$ is displayed by the two phylogenetic networks obtained from $G_{j}^{A}$ and $G_{j}^{B}$ by suppressing the three vertices $r_{j}^{1}$ , $r_{j}^{2}$ , and $r_{j}^{3}$ of in-degree one and out-degree one, it follows that ${\mathcal{T}}_{j}^{z_{1}z_{2}z_{3}}$ is also displayed by ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ . In turn, this implies that, by construction, ${\mathcal{T}}_{c}$ is displayed by ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ . Lastly, we construct a phylogenetic tree ${\mathcal{T}}$ on $X$ by creating a vertex $\rho$ , adding a new edge that joins $\rho$ with the root of ${\mathcal{T}}_{v}$ , and a new edge that joins $\rho$ with the root of ${\mathcal{T}}_{c}$ . As ${\mathcal{T}}_{v}$ and ${\mathcal{T}}_{c}$ are displayed by ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ , it is easily checked that ${\mathcal{T}}$ is displayed by ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ , and so $T({\mathcal{N}})\cap T({\mathcal{N}}^{\prime})\neq\emptyset$ .

Second, suppose that $T({\mathcal{N}})\cap T({\mathcal{N}}^{\prime})\neq\emptyset$ . Let ${\mathcal{T}}$ be a phylogenetic $X$ -tree that is displayed by ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ . Furthermore, let $j,j^{\prime}\in\{1,2,\ldots,m\}$ , and let $\ell,\ell^{\prime}\in\{1,2,3\}$ . For each reticulation $r_{j}^{\ell}$ in ${\mathcal{N}}$ (resp. ${\mathcal{N}}^{\prime}$ ), we say that ${\mathcal{T}}$ * picks $x_{j}^{\ell}$ from the clause side* of ${\mathcal{N}}$ (resp. ${\mathcal{N}}^{\prime}$ ) if ${\mathcal{T}}$ has a vertex whose set of descendants contains $x_{j}^{\ell}$ and $C_{j}$ but does not contain any element in $V$ ; otherwise, we say that ${\mathcal{T}}$ * picks $x_{j}^{\ell}$ from the variable side* of ${\mathcal{N}}$ (resp. ${\mathcal{N}}^{\prime}$ ). Intuitively, $x_{j}^{\ell}$ is picked from the clause side of ${\mathcal{N}}$ (resp. ${\mathcal{N}}^{\prime}$ ) precisely if the embedding of ${\mathcal{T}}$ in ${\mathcal{N}}$ (resp. ${\mathcal{N}}^{\prime}$ ) contains the reticulation edge directed into $r_{j}^{\ell}$ whose two end vertices are vertices of $G_{j}^{A}$ (resp. $G_{j}^{B}$ ). Note that, as ${\mathcal{T}}$ is displayed by ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ , we have that ${\mathcal{T}}$ picks $x_{j}^{\ell}$ from the variable side of ${\mathcal{N}}$ if and only if ${\mathcal{T}}$ picks $x_{j}^{\ell}$ from the variable side of ${\mathcal{N}}^{\prime}$ . We next make two observations:

(O1)

For each clause $C_{j}=(x_{j}^{1}\vee x_{j}^{2}\vee x_{j}^{3})$ , it follows from Lemma 3.1 that ${\mathcal{T}}$ picks at most two of $x_{j}^{1}$ , $x_{j}^{2}$ , and $x_{j}^{3}$ from the clause side of ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ . 2. (O2)

It follows from Step (3) in the construction of ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ , and the fact that ${\mathcal{T}}$ is displayed by ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ that, if ${\mathcal{T}}$ picks $x_{j}^{\ell}$ from the variable side of ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ , and $x_{j}^{\ell}=v_{i}$ for some $i\in\{1,2,\ldots,n\}$ , then each $x_{j^{\prime}}^{\ell^{\prime}}$ with $x_{j^{\prime}}^{\ell^{\prime}}=\bar{v}_{i}$ is picked from the clause side of ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ . Similarly, if ${\mathcal{T}}$ picks $x_{j}^{\ell}$ from the variable side of ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ , and $x_{j}^{\ell}=\bar{v}_{i}$ for some $i\in\{1,2,\ldots,n\}$ , then each $x_{j^{\prime}}^{\ell^{\prime}}$ with $x_{j^{\prime}}^{\ell^{\prime}}=v_{i}$ is picked from the clause side of ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ .

Now, let $\beta$ be the truth assignment that is defined as follows. For each $i\in\{1,2,\ldots,n\}$ , we set $v_{i}=T$ if there exists an element $x_{j}^{\ell}$ with $x_{j}^{\ell}=v_{i}$ that is picked from the variable side of ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ . On the other hand, we set $v_{i}=F$ if either there exists an element $x_{j}^{\ell}$ with $x_{j}^{\ell}=\bar{v}_{i}$ that is picked from the variable side of ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ or there is no $x_{j}^{\ell}$ with $x_{j}^{\ell}\in\{v_{i},\bar{v}_{i}\}$ that is picked from the variable side of ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ . Because of (O2), $\beta$ is well defined. Moreover, by (O1) it follows that $\beta$ satisfies at least one literal of each clause and, hence, $I$ is a yes-instance. ∎

This completes the proof of Theorem 3.2. ∎

The next corollary is an immediate consequence of Theorem 3.2.

Corollary 3.3.

Let ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ be two temporal normal networks on $X$ . It is co-NP-complete to decide if $T({\mathcal{N}})\cap T({\mathcal{N}}^{\prime})=\emptyset$ .

4. Hardness of Display-Set-Equivalence

In this section, we show that Display-Set-Equivalence is $\Pi_{2}^{P}$ -complete, that is, the problem is complete for the second level of the polynomial hierarchy. To establish this result, we use a chain of three polynomial-time reductions that are described in Subsections 4.1, 4.2, and 4.3. Before detailing the reductions, we introduce two more decision problems that play an important role in this section.

Recall the (ordinary) 3-SAT problem as introduced in Section 3. The input to an instance of 3-SAT consists of a boolean formula over a set of variables. Importantly, each variable is existentially quantified since we are asking whether or not there exists a truth assignment to each variable that satisfies each clause of the formula. In contrast, the following quantified version of 3-SAT has two different types of variables, i.e each variable is either existentially or universally quantified.

$\forall\exists$ 3-SAT

Input. A quantified boolean formula

[TABLE]

over a set of variables $V=\{v_{1},v_{2},\ldots,v_{n}\}$ such that each clause $C_{j}$ is a disjunction of exactly three literals and each literal is an element in $\{v_{i},\bar{v}_{i}:i\in\{1,2,\ldots,n\}\}$ .

Question. For each truth assignment $\beta^{\forall}:\{v_{1},v_{2},\ldots,v_{p}\}\rightarrow\{F,T\}$ , does there exist a truth assignment $\beta^{\exists}:\{v_{p+1},v_{p+2},\ldots,v_{p}\}\rightarrow\{F,T\}$ such that, collectively, $\beta^{\forall}$ and $\beta^{\exists}$ satisfy each clause in $\Psi$ ?

It was shown in [14] that $\forall\exists$ 3-SAT is $\Pi_{2}^{P}$ -complete. Let $I$ be an instance of $\forall\exists$ 3-SAT. Note that each clause of $I$ has at least one literal that is an element in $\{x_{i},\bar{x}_{i}:i\in\{p+1,p+2,\ldots,n\}\}$ since, otherwise, $I$ is a no-instance. Furthermore, if all variables are existentially quantified, then $I$ is an instance of the (ordinary) 3-SAT problem. Hence, we may assume throughout this section that $1\leq p<n$ .

We next formally state a quantified version of the well-known NP-complete decision problem Directed-Disjoint-Connecting-Paths [5, 12]. Let $G$ be a directed graph with vertex set $V$ , and let $\{(s_{1},t_{1}),(s_{2},t_{2}),\ldots,(s_{k},t_{k})\}$ be a collection of pairs of vertices in $V$ . In what follows, we write $\pi_{i}$ to denote a directed path in $G$ from $s_{i}$ to $t_{i}$ with $i\in\{1,2,\ldots,k\}$ .

$\forall\exists$ Directed-Disjoint-Connecting-Paths

Input. A directed graph $G$ and two collections

[TABLE]

of pairs of vertices in $G$ such that $1\leq p<k$ and, for each $(s_{i},t_{i})\in P^{\forall}$ , there exists a directed path from $s_{i}$ to $t_{i}$ in $G$ .

Question. For each set $\Pi^{\forall}=\{\pi_{1},\pi_{2},\ldots,\pi_{p}\}$ of directed paths, does there exist a set $\Pi^{\forall}\cup\{\pi_{p+1},\pi_{p+2},\ldots,\pi_{k}\}$ of mutually vertex-disjoint directed paths in $G$ ?

4.1. $\forall\exists$ Directed-Disjoint-Connecting-Paths is $\Pi_{2}^{P}$ -complete

To show that $\forall\exists$ Directed-Disjoint-Connecting-Paths is complete for the second level of the polynomial hierarchy, we use a polynomial-time reduction from $\forall\exists$ 3-SAT. This reduction constructs a special instance of $\forall\exists$ Directed-Disjoint-Connecting-Paths for which the input graph is a particular type of phylogenetic network.

Let ${\mathcal{N}}$ be a phylogenetic network on $X$ , let $S=\{s_{1},s_{2},\ldots,s_{k}\}$ and $T=\{t_{1},t_{2},\ldots,t_{k}\}$ be two disjoint subsets of the vertices of ${\mathcal{N}}$ such that $T=X$ , and let $p\in\{1,2,\ldots,k\}$ . We call ${\mathcal{N}}$ a caterpillar-inducing network with respect to $S$ if the network obtained from ${\mathcal{N}}$ by deleting each vertex that lies on a directed path from a child of a vertex in $S$ to a leaf of ${\mathcal{N}}$ is a caterpillar up to deleting all leaf labels. Moreover, we say that ${\mathcal{N}}$ has the two-path property relative to $p$ if, for each $i\in\{1,2,\ldots,p\}$ , there are two directed paths, say $\pi_{i}$ and $\pi_{i}^{\prime}$ , from $s_{i}$ to $t_{i}$ such that the following three properties are satisfied:

(i)

$\pi_{i}$ and $\pi_{i}^{\prime}$ are the only directed paths from $s_{i}$ to $t_{i}$ in ${\mathcal{N}}$ , 2. (ii)

$\pi_{i}$ and $\pi_{i}^{\prime}$ only have the three vertices $s_{i}$ , $t_{i}$ , and the (unique) parent of $t_{i}$ as well as the edge directed into $t_{i}$ in common, and 3. (iii)

no path in $\{\pi_{i},\pi_{i}^{\prime}:i\in\{1,2,\ldots,p\}\}$ intersects with any path in $\{\pi_{j},\pi_{j}^{\prime}:j\in\{1,2,\ldots,p\}-\{i\}\}$ .

Using the same notation as in the statement of $\forall\exists$ Directed-Disjoint-Connecting-Paths, we now introduce a similar problem whose input graph is a phylogenetic network.

$\forall\exists$ Phylo-Directed-Disjoint-Connecting-Paths

Input. A phylogenetic network ${\mathcal{N}}$ on $X$ , two sets $S=\{s_{1},s_{2},\ldots,s_{k}\}$ and $T=X=\{t_{1},t_{2},\ldots,t_{k}\}$ of vertices of ${\mathcal{N}}$ , and an integer $p$ with $1\leq p<k$ such that ${\mathcal{N}}$ is caterpillar-inducing with respect to $S$ and has the two-path property relative to $p$ . Furthermore, the two collections

[TABLE]

of pairs of elements in $S$ and $T$ .

Question. For each set $\Pi^{\forall}=\{\pi_{1},\pi_{2},\ldots,\pi_{p}\}$ of directed paths, does there exist a set $\Pi^{\forall}\cup\{\pi_{p+1},\pi_{p+2},\ldots,\pi_{k}\}$ of mutually vertex-disjoint directed paths in ${\mathcal{N}}$ ?

The next theorem establishes the $\Pi_{2}^{P}$ -completeness of $\forall\exists$ Phylo-Directed-Disjoint-Connecting-Paths. The reduction that we use for the proof has a flavor that is similar to that in [8, page 86].

Theorem 4.1.

The decision problem $\forall\exists$ Phylo-Directed-Disjoint-Connecting-Paths is $\Pi_{2}^{P}$ -complete.

Proof.

We first show that $\forall\exists$ Phylo-Directed-Disjoint-Connecting-Paths is in $\Pi_{2}^{P}$ . Using the same notation as in the formal statement of this problem, guess a set $\Pi^{\forall}=\{\pi_{1},\pi_{2},\ldots,\pi_{p}\}$ of directed paths in ${\mathcal{N}}$ . Since ${\mathcal{N}}$ has the two-path property relative to $p$ , the paths in $\Pi^{\forall}$ are mutually vertex disjoint. Next obtain the directed graph $G$ from ${\mathcal{N}}$ by deleting all vertices that lie on a path in $\Pi^{\forall}$ . Lastly, use an NP-oracle for the unquantified version of Directed-Disjoint-Connecting-Paths to decide if there exists a set $\Pi^{\exists}=\{\pi_{p+1},\pi_{p+2},\ldots,\pi_{k}\}$ of mutually vertex-disjoint directed paths in $G$ . Since a given instance of $\forall\exists$ Phylo-Directed-Disjoint-Connecting-Paths is a no-instance precisely if there exists some set $\Pi^{\forall}$ for which no choice of $\Pi^{\exists}$ results in a set $\Pi^{\forall}\cup\Pi^{\exists}$ of mutually vertex-disjoint directed paths in ${\mathcal{N}}$ , it follows that this problem is in co-NP ${}^{\text{NP}}=\Pi_{2}^{P}$ .

We now establish a polynomial-time reduction from the quantified 3-SAT problem. Let $I$ be an instance of $\forall\exists$ 3-SAT with boolean formula

[TABLE]

over a set $V=\{v_{1},v_{2},\ldots,v_{n}\}$ of variables. Throughout the proof, we use $C_{j}=(x_{3j-2}\vee x_{3j-1}\vee x_{3j})$ to refer to the three literals in $C_{j}$ for each $j\in\{1,2,\ldots,m\}$ . Now, for each $i\in\{1,2,\ldots,n\}$ , let $\mathcal{J}_{i}^{+}$ be the set that consists of the indices of the literals that are equal to $v_{i}$ and, similarly, let $\mathcal{J}_{i}^{-}$ be the set that consists of the indices of the literals that are equal to $\bar{v}_{i}$ . Without loss of generality, we may assume that ${\mathcal{J}}_{i}^{+}\neq\emptyset$ or ${\mathcal{J}}_{i}^{-}\neq\emptyset$ since, otherwise, $v_{i}$ can be deleted from $V$ .

For each variable $v_{i}$ , we construct a variable gadget $G_{i}^{v}$ as follows:

(1)

Create three vertices $s_{i}^{v}$ , $t_{i}^{v}$ , and $y_{i}$ . 2. (2)

Create the (possibly empty) set of vertices $\bigcup_{l\in\mathcal{J}_{i}^{+}}\{p_{l}^{\text{in}},p_{l}^{\text{out}}\}$ and construct the directed path

[TABLE]

with $\{l_{1},l_{2},\ldots,l_{q}\}=\mathcal{J}_{i}^{+}$ . 3. (3)

Create the (possibly empty) set of vertices $\bigcup_{k\in\mathcal{J}_{i}^{-}}\{n_{k}^{\text{in}},n_{k}^{\text{out}}\}$ and construct the directed path

[TABLE]

with $\{k_{1},k_{2},\ldots,k_{r}\}=\mathcal{J}_{i}^{-}$ .

Note that, since we do not allow for parallel edges, the last edge $(y_{i},t_{i}^{v})$ of $\pi^{+}_{i}$ and $\pi^{-}_{i}$ only appears once in $G_{i}^{v}$ . Intuitively, the two paths $\pi^{+}_{i}$ and $\pi^{-}_{i}$ correspond to the two possible truth assignments for the variable $v_{i}$ . To illustrate, a generic variable gadget for $v_{i}$ is shown on the left-hand side of Figure 5. The additional edges in this figure that are directed into vertices of the variable gadget and directed out of vertices of this gadget will be defined as part of the clause gadget construction which we describe next.

For a clause $C_{j}=(x_{3j-2}\vee x_{3j-1}\vee x_{3j})$ , let $i_{j}$ , $i_{j}^{\prime}$ , and $i_{j}^{\prime\prime}$ be the elements in $\{1,2,\ldots,n\}$ such that $x_{3j-2}\in\{v_{i_{j}},\bar{v}_{i_{j}}\}$ , $x_{3j-1}\in\{v_{i_{j}^{\prime}},\bar{v}_{i_{j}^{\prime}}\}$ , and $x_{3j}\in\{v_{i_{j}^{\prime\prime}},\bar{v}_{i_{j}^{\prime\prime}}\}$ . Now, for each $j\in\{1,2,\ldots,m\}$ , add the following vertices and edges to the variable gadgets.

(1)

Create the vertices $\{s_{j}^{c},t_{j}^{c},u_{j},w_{j},w^{\prime}_{j}\}$ . 2. (2)

Add the edges in $\{(s_{j}^{c},u_{j})$ , $(w_{j},w^{\prime}_{j})$ , $(w^{\prime}_{j},t_{j}^{c})\}$ . 3. (3)

If $x_{3j-2}=v_{i_{j}}$ , add the edges $(u_{j},p_{3j-2}^{\text{in}})$ and $(p_{3j-2}^{\text{out}},w_{j})$ . Otherwise, add the edges $(u_{j},n_{3j-2}^{\text{in}})$ and $(n_{3j-2}^{\text{out}},w_{j})$ . 4. (4)

If $x_{3j-1}=v_{i_{j}^{\prime}}$ , add the edges $(u_{j},p_{3j-1}^{\text{in}})$ and $(p_{3j-1}^{\text{out}},w_{j})$ . Otherwise, add the edges $(u_{j},n_{3j-1}^{\text{in}})$ and $(n_{3j-1}^{\text{out}},w_{j})$ . 5. (5)

If $x_{3j}=v_{i_{j}^{\prime\prime}}$ , add the edges $(s_{j}^{c},p_{3j}^{\text{in}})$ and $(p_{3j}^{\text{out}},w^{\prime}_{j})$ . Otherwise, add the edges $(s_{j}^{c},n_{3j}^{\text{in}})$ and $(n_{3j}^{\text{out}},w^{\prime}_{j})$ .

In what follows, we refer to the edges and vertices that get added in the aforementioned 5-step construction relative to a given $C_{j}$ as the clause gadget for $C_{j}$ . For each clause $C_{j}=(x_{3j-2}\vee x_{3j-1}\vee x_{3j})$ , there are three directed paths from $s_{j}^{c}$ to $t_{j}^{c}$ each of which corresponds to one of the three literals in $C_{j}$ . For example, for the first literal $x_{3j-2}$ , there is a directed path from $s_{j}^{c}$ to $t_{j}^{c}$ that intersects with the edge $(p_{3j-2}^{\text{in}},p_{3j-2}^{\text{out}})$ on $\pi_{i_{j}}^{+}$ if $x_{3j-2}=v_{i_{j}}$ and that intersects with the edge $(n_{3j-2}^{\text{in}},n_{3j-2}^{\text{out}})$ on $\pi_{i_{j}}^{-}$ if $x_{3j-2}=\bar{v}_{i_{j}}$ . To illustrate, assume that $x_{3j-2}=v_{i_{j}}$ , $x_{3j-1}=\bar{v}_{i_{j}^{\prime}}$ , and $x_{3j}=v_{i_{j}^{\prime\prime}}$ . For this specific case, the clause gadget for $C_{j}$ is shown on the right-hand side of Figure 5.

Now, let $G$ be the directed graph that results from the construction of all variable and all clause gadgets. Observe that $G$ is acyclic. We next set up an instance $I^{\prime}$ of $\forall\exists$ Phylo-Directed-Disjoint-Connecting-Paths. Let ${\mathcal{T}}$ be the caterpillar $(\ell_{1}^{v},\ell_{2}^{v},\ldots,\ell_{n}^{v},\ell_{1}^{c},\ell_{2}^{c},\ldots,\ell_{m}^{c})$ . We obtain a directed acyclic graph ${\mathcal{N}}$ from ${\mathcal{T}}$ and $G$ by identifying $\ell_{i}^{v}$ with $s_{i}^{v}$ for each $i\in\{1,2,\ldots,n\}$ and identifying $\ell_{j}^{c}$ with $s_{j}^{c}$ for each $j\in\{1,2,\ldots,m\}$ . Clearly, ${\mathcal{N}}$ is connected and has no parallel edges. Moreover, except for the root, since each vertex of $G$ has in-degree one and out-degree two, in-degree two and out-degree one, or in-degree one and out-degree zero, it follows that ${\mathcal{N}}$ is a phylogenetic network on $T=\{t_{1}^{v},t_{2}^{v},\ldots,t_{n}^{v},t_{1}^{c},t_{2}^{c},\ldots,t_{m}^{c}\}$ . Let $S=\{s_{1}^{v},s_{2}^{v},\ldots,s_{n}^{v},s_{1}^{c},s_{2}^{c},\ldots,s_{m}^{c}\}$ . Since every vertex of $G$ that is not contained in $S$ lies on a directed path from a child of a vertex in $S$ to a leaf in ${\mathcal{N}}$ , it follows that ${\mathcal{N}}$ is caterpillar-inducing with respect to $S$ . Moreover, for each $i\in\{1,2,\ldots,n\}$ , there are exactly two directed paths from $s_{i}^{v}$ to $t_{i}^{v}$ in $G_{i}^{v}$ and, hence, in ${\mathcal{N}}$ that only intersect in the vertices $s_{i}^{v}$ , $t_{i}^{v}$ , and $y_{i}$ , and the edge $(y_{i},t_{i}^{v})$ . Recalling that $1\leq p<n$ , it follows from the construction that ${\mathcal{N}}$ has the two-path property relative to $p$ , and that both $P^{\forall}$ and $P^{\exists}$ are non-empty. We now set

[TABLE]

This completes the description of $I^{\prime}$ .

Since the number of vertices of $G$ is $3n+11m$ , the number of vertices of ${\mathcal{T}}$ is $2(n+m)-1$ , and $G$ and ${\mathcal{T}}$ have $n+m$ vertices in common, it follows that ${\mathcal{N}}$ has size $O(n+m)$ and can be constructed in polynomial time.

We complete the proof by establishing the following sublemma.

4.1.1.

The instance $I$ is a yes-instance if and only if the instance $I^{\prime}$ is a yes-instance.

Proof.

First, suppose that $I$ is a yes-instance. Let $\Pi^{\forall}=\{\pi_{1}^{v},\pi_{2}^{v},\ldots,\pi_{p}^{v}\}$ be a set of directed paths in ${\mathcal{N}}$ such that each $\pi_{i}^{v}$ begins at $s_{i}^{v}$ and ends at $t_{i}^{v}$ . As $p<n$ , we have $\pi_{i}^{v}\in\{\pi_{i}^{+},\pi_{i}^{-}\}$ . Moreover, since ${\mathcal{N}}$ has the two-path property relative to $p$ , the paths in $\Pi^{\forall}$ are mutually vertex disjoint in ${\mathcal{N}}$ . Now, let $\beta:V\rightarrow\{F,T\}$ be a truth assignment that satisfies each clause of $\Psi$ such that, if $\pi_{i}^{v}=\pi_{i}^{+}$ , then $v_{i}=F$ and, otherwise, $v_{i}=T$ for each $i\in\{1,2,\ldots,p\}$ . Since $I$ is a yes-instance, $\beta$ exists. We next construct a directed path for each pair of vertices in $P^{\exists}$ such that, collectively, these paths together with the elements in $\Pi^{\forall}$ form a solution to $I^{\prime}$ . For each $i\in\{p+1,p+2,\ldots,n\}$ , set $\pi_{i}^{v}=\pi_{i}^{+}$ if $v_{i}=F$ and set $\pi_{i}^{v}=\pi_{i}^{-}$ if $v_{i}=T$ . Furthermore, for each $j\in\{1,2,\ldots,m\}$ , let $x_{j^{\prime}}$ , with $j^{\prime}\in\{3j-2,3j-1,3j\}$ , be a literal in $C_{j}$ that is satisfied by $\beta$ , and let $i$ be the element in $\{1,2,\ldots,n\}$ such that $x_{j^{\prime}}\in\{v_{i},\bar{v}_{i}\}$ . By construction of the clause gadget, there is a directed path, say $\pi_{j}^{c}$ , from $s_{j}^{c}$ to $t_{j}^{c}$ in ${\mathcal{N}}$ such that one of the following properties applies.

(i)

If $x_{j^{\prime}}=v_{i}$ , then $\pi_{j}^{c}$ contains the edge $(p_{j^{\prime}}^{\text{in}},p_{j^{\prime}}^{\text{out}})$ . 2. (ii)

If $x_{j^{\prime}}=\bar{v}_{i}$ , then $\pi_{j}^{c}$ contains the edge $(n_{j^{\prime}}^{\text{in}},n_{j^{\prime}}^{\text{out}})$ .

In Case (i), as $v_{i}=T$ , we have $\pi_{i}^{v}=\pi_{i}^{-}$ , and it follows that $\pi_{j}^{c}$ does not intersect $\pi_{i}^{v}$ . Similar in Case (ii), as $v_{i}=F$ , we have $\pi_{i}^{v}=\pi_{i}^{+}$ , and it again follows that $\pi_{j}^{c}$ does not intersect $\pi_{i}^{v}$ . By construction of ${\mathcal{N}}$ , it is now straightforward to check that

[TABLE]

is a collection of mutually vertex-disjoint directed-paths in ${\mathcal{N}}$ that connect each pair of vertices in $P^{\forall}\cup P^{\exists}$ . In particular, since the argument presented in this paragraph applies to all choices of directed paths in $\Pi^{\forall}$ , we conclude that $I^{\prime}$ is a yes-instance.

Second, suppose that $I^{\prime}$ is a yes-instance. Let $\beta^{\forall}:\{v_{1},v_{2},\ldots,v_{p}\}\rightarrow\{F,T\}$ be a truth assignment. Furthermore, let

[TABLE]

be a collection of mutually vertex-disjoint directed paths in ${\mathcal{N}}$ such that $\pi_{i}^{v}=\pi_{i}^{-}$ if $v_{i}=T$ and $\pi_{i}^{v}=\pi_{i}^{+}$ if $v_{i}=F$ for each $i\in\{1,2,\ldots,p\}$ . Since $I^{\prime}$ is a yes-instance, $\Pi$ exists. Now, let $\beta:V\rightarrow\{F,T\}$ such that

(i)

for each $i\in\{1,2,\ldots,p\}$ , we have $\beta(v_{i})=\beta^{\forall}(v_{i})$ and, 2. (ii)

for each $i\in\{p+1,p+2,\ldots,n\}$ , we have $\beta(v_{i})=F$ if $\pi_{i}^{v}=\pi_{i}^{+}$ and, $\beta(v_{i})=T$ if $\pi_{i}^{v}=\pi_{i}^{-}$ .

We next show that $\beta$ satisfies each clause of $\Psi$ . Let $C_{j}=(x_{3j-2}\vee x_{3j-1}\vee x_{3j})$ be a clause of $\Psi$ with $j\in\{1,2,\ldots,m\}$ . Consider the directed path $\pi_{j}^{c}\in\Pi$ from $s_{j}^{c}$ to $t_{j}^{c}$ in ${\mathcal{N}}$ . Let $j^{\prime}$ be the unique element in $\{3j-2,3j-1,3j\}$ such that $\pi_{j}^{c}$ contains either the edge $(p_{j^{\prime}}^{\text{in}},p_{j^{\prime}}^{\text{out}})$ or the edge $(n_{j^{\prime}}^{\text{in}},n_{j^{\prime}}^{\text{out}})$ , and let $i$ be the element in $\{1,2,\ldots,n\}$ such that $x_{j^{\prime}}\in\{v_{i},\bar{v}_{i}\}$ . First, assume that $\pi_{j}^{c}$ contains $(p_{j^{\prime}}^{\text{in}},p_{j^{\prime}}^{\text{out}})$ . Then, as $x_{j^{\prime}}=v_{i}$ and the paths in $\Pi$ are mutually vertex disjoint in ${\mathcal{N}}$ , it follows that $\pi_{i}^{v}=\pi_{i}^{-}$ . Hence $\beta(v_{i})=T$ . Second, assume that $\pi_{j}^{c}$ contains $(n_{j^{\prime}}^{\text{in}},n_{j^{\prime}}^{\text{out}})$ . Then, as $x_{j^{\prime}}=\bar{v}_{i}$ and the paths in $\Pi$ are mutually vertex disjoint, it follows that $\pi_{i}^{v}=\pi_{i}^{+}$ . Hence $\beta(v_{i})=F$ . Under both assumptions, $\beta$ satisfies $C_{j}$ because $\beta(x_{j^{\prime}})=T$ . It now follows that $\beta$ satisfies $\Psi$ and, as the argument applies to all choices of truth assignments for the elements in $\{v_{1},v_{2},\ldots,v_{p}\}$ , we conclude that $I$ is a yes-instance. ∎

This completes the proof of Theorem 4.1. ∎

While the next corollary is not needed for the remainder of the paper, it may be of independent interest in the theoretical computer science community.

Corollary 4.2.

The decision problem $\forall\exists$ Directed-Disjoint-Connecting-Paths is $\Pi_{2}^{P}$ -complete.

Proof.

Since every instance of $\forall\exists$ Phylo-Directed-Disjoint-Connecting-Paths is also an instance of $\forall\exists$ Directed-Disjoint-Connecting-Paths, it follows from Theorem 4.1 that the latter problem is $\Pi_{2}^{P}$ -hard. To establish that $\forall\exists$ Directed-Disjoint-Connecting-Paths is in $\Pi_{2}^{P}$ , we use the same argument as in the first paragraph of the proof of Theorem 4.1 and, additionally, check in polynomial time if the paths in $\Pi^{\forall}$ are vertex disjoint. ∎

4.2. Display-Set-Containment is $\Pi_{2}^{P}$ -complete

In this section, we show that Display-Set-Containment is complete for the second level of the polynomial hierarchy. This problem is a generalization of the well-known NP-complete Tree-Containment problem [7].

Theorem 4.3.

Display-Set-Containment* is $\Pi_{2}^{P}$ -complete.*

Proof.

We first show that Display-Set-Containment is in $\Pi_{2}^{P}$ . Let ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ be two phylogenetic networks on $X$ . To decide if $T({\mathcal{N}})\subseteq T({\mathcal{N}}^{\prime})$ , guess a switching of ${\mathcal{N}}$ . Let ${\mathcal{T}}$ be the phylogenetic $X$ -tree that is yielded by $S$ . Then use an NP-oracle for Tree-Containment to decide if ${\mathcal{T}}$ is displayed by ${\mathcal{N}}^{\prime}$ . Since ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ form a no-instance precisely if there exists some switching for ${\mathcal{N}}$ that yields a phylogenetic tree that is not displayed by ${\mathcal{N}}^{\prime}$ , it follows that Display-Set-Containment is in co-NP ${}^{\text{NP}}=\Pi_{2}^{P}$ .

To complete the proof, we establish a reduction from $\forall\exists$ Phylo-Directed-Disjoint-Connecting-Paths. Using the same notation as in the formal statement of $\forall\exists$ Phylo-Directed-Disjoint-Connecting-Paths, let $I$ be the following instance of this problem. Let ${\mathcal{N}}$ be a phylogenetic network on $X$ , let $S=\{s_{1},s_{2},\ldots,s_{k}\}$ and $T=X=\{t_{1},t_{2},\ldots,t_{k}\}$ be two disjoint sets of vertices of ${\mathcal{N}}$ , and let $p$ be an integer with $1\leq p<k$ such that ${\mathcal{N}}$ is caterpillar-inducing with respect to $S$ and has the two-path property relative to $p$ . Furthermore, let

[TABLE]

be two collections of pairs of elements in $S$ and $T$ . This completes the description of $I$ .

Now, let ${\mathcal{N}}_{1}$ be the phylogenetic network obtained from the caterpillar $(t_{0},s_{1},s_{2},\ldots,s_{p},t_{p+1},t_{p+2},\ldots,t_{k})$ by adding the following edges and vertices for each $i\in\{1,2,\ldots,p\}$ . Create three vertices $u_{i}^{1}$ , $u_{i}^{2}$ , and $u_{i}^{3}$ and add the set

[TABLE]

of edges. Observe that the leaf set of ${\mathcal{N}}_{1}$ is

[TABLE]

The construction of ${\mathcal{N}}_{1}$ is shown on the left-hand side of Figure 6. We complete the reduction to an instance of Display-Set-Containment by describing a second phylogenetic network ${\mathcal{N}}_{2}$ . For each $i\in\{1,2,\ldots,p\}$ , let $w_{i}^{\prime}$ and $w_{i}^{\prime\prime}$ be the two children of $s_{i}$ in ${\mathcal{N}}$ . As ${\mathcal{N}}$ has the two-path property relative to $p$ , recall that there are exactly two directed paths from $s_{i}$ to $t_{i}$ in ${\mathcal{N}}$ , and these two paths only have $s_{i}$ , $t_{i}$ , and the parent of $t_{i}$ in common. In the remainder of the proof, we denote the directed path from $s_{i}$ to $t_{i}$ that contains $w_{i}^{\prime}$ with $\pi_{i}^{\prime}$ and, similarly, we denote the directed path from $s_{i}$ to $t_{i}$ that contains $w_{i}^{\prime\prime}$ with $\pi_{i}^{\prime\prime}$ . Lastly, we denote the parent of $s_{1}$ with $p_{1}$ . Now, obtain ${\mathcal{N}}_{2}$ from ${\mathcal{N}}$ in the following way.

(i)

Subdivide the edge $(p_{1},s_{1})$ with a new vertex $u$ and add the edge $(u,t_{0})$ . 2. (ii)

For each $i\in\{1,2,\ldots,p\}$ , subdivide $(s_{i},w_{i}^{\prime})$ with a new vertex $v_{i}^{\prime}$ , subdivide $(s_{i},w_{i}^{\prime\prime})$ with a new vertex $v_{i}^{\prime\prime}$ , and add the two edges $(v_{i}^{\prime},t_{i}^{\prime})$ and $(v_{i}^{\prime\prime},t_{i}^{\prime\prime})$ .

Clearly, the leaf set of ${\mathcal{N}}_{2}$ is $X^{\prime}$ . To illustrate, ${\mathcal{N}}_{2}$ is shown on the right-hand side in Figure 6.

As the size of $X^{\prime}$ is polynomial in the size of $X$ , it follows that the size of ${\mathcal{N}}_{1}$ and ${\mathcal{N}}_{2}$ is polynomial in the size of ${\mathcal{N}}$ . Furthermore, the construction of ${\mathcal{N}}_{1}$ and ${\mathcal{N}}_{2}$ takes polynomial time.

4.3.1.

The instance $I$ is a yes-instance if and only if $T({\mathcal{N}}_{1})\subseteq T({\mathcal{N}}_{2})$ .

Proof.

First, suppose that $I$ is a yes-instance. Let ${\mathcal{T}}^{\prime}$ be a phylogenetic $X^{\prime}$ -tree that is displayed by ${\mathcal{N}}_{1}$ . For each $i\in\{1,2,\ldots,p\}$ , note that ${\mathcal{T}}^{\prime}$ contains one of the two caterpillars $(t_{i},t_{i}^{\prime},t_{i}^{\prime\prime})$ or $(t_{i},t_{i}^{\prime\prime},t_{i}^{\prime})$ . Let ${\mathcal{J}}^{\prime}$ be the set that consists of each element $i\in\{1,2,\ldots,p\}$ for which ${\mathcal{T}}^{\prime}$ contains $(t_{i},t_{i}^{\prime},t_{i}^{\prime\prime})$ and, similarly, let ${\mathcal{J}}^{\prime\prime}$ be the set that consists of each element $i\in\{1,2,\ldots,p\}$ for which ${\mathcal{T}}^{\prime}$ contains $(t_{i},t_{i}^{\prime\prime},t_{i}^{\prime})$ . Furthermore, let $\Pi^{\forall}=\{\pi_{1},\pi_{2},\ldots,\pi_{p}\}$ be the set of directed paths in ${\mathcal{N}}$ such that $\pi_{i}=\pi_{i}^{\prime}$ if $i\in{\mathcal{J}}^{\prime}$ and $\pi_{i}=\pi_{i}^{\prime\prime}$ if $i\in{\mathcal{J}}^{\prime\prime}$ . Since $I$ is a yes-instance, there exists a set $\Pi=\Pi^{\forall}\cup\{\pi_{p+1},\pi_{p+2},\ldots,\pi_{k}\}$ of mutually vertex-disjoint directed paths in ${\mathcal{N}}$ , where $\pi_{j}$ is a directed path from $s_{j}$ to $t_{j}$ for each $j\in\{p+1,p+2,\ldots,k\}$ . Moreover, as ${\mathcal{N}}$ is caterpillar-inducing with respect to $S$ , it is straightforward to check that there exists a phylogenetic $X$ -tree ${\mathcal{T}}$ such that the following three properties are satisfied:

(i)

${\mathcal{T}}$ is displayed by ${\mathcal{N}}$ , 2. (ii)

${\mathcal{T}}={\mathcal{T}}^{\prime}|X$ , and 3. (iii)

there exists an embedding of ${\mathcal{T}}$ in ${\mathcal{N}}$ that contains all edges of paths in $\Pi$ .

Let $E_{\mathcal{T}}$ be an embedding of ${\mathcal{T}}$ in ${\mathcal{N}}$ that satisfies (iii). By construction of ${\mathcal{N}}_{2}$ from ${\mathcal{N}}$ , there exists an embedding of ${\mathcal{T}}$ in ${\mathcal{N}}_{2}$ whose set of edges is

[TABLE]

For each $i\in\{1,2,\ldots,p\}$ , let $E_{i}^{\prime}$ be the subset $\{(v_{i}^{\prime},t_{i}^{\prime}),(v_{i}^{\prime\prime},t_{i}^{\prime\prime}),(s_{i},v_{i}^{\prime\prime})\}$ of edges in ${\mathcal{N}}_{2}$ if $i\in{\mathcal{J}}^{\prime}$ , and the subset $\{(v_{i}^{\prime\prime},t_{i}^{\prime\prime}),(v_{i}^{\prime},t_{i}^{\prime}),(s_{i},v_{i}^{\prime})\}$ of edges in ${\mathcal{N}}_{2}$ if $i\in{\mathcal{J}}^{\prime\prime}$ . Since $E_{\mathcal{T}}^{\prime}$ is an embedding of ${\mathcal{T}}$ in ${\mathcal{N}}_{2}$ , it now follows that

[TABLE]

is an embedding of ${\mathcal{T}}^{\prime}$ in ${\mathcal{N}}_{2}$ . Hence, $T({\mathcal{N}}_{1})\subseteq T({\mathcal{N}}_{2})$ .

Second, suppose that $I$ is a no-instance. Throughout this part of the proof, we use $\pi_{i}$ to denote a directed path from $s_{i}$ to $t_{i}$ in ${\mathcal{N}}$ for each $i\in\{1,2,\ldots,k\}$ . Then, as ${\mathcal{N}}$ has the two-path property relative to $p$ , there is a set $\Pi^{\forall}=\{\pi_{1},\pi_{2},\ldots,\pi_{p}\}$ of mutually vertex-disjoint directed paths in ${\mathcal{N}}$ for which every set $\Pi=\Pi^{\forall}\cup\{\pi_{p+1},\pi_{p+2},\ldots,\pi_{k}\}$ of directed paths in ${\mathcal{N}}$ contains two elements that are not vertex disjoint. For each $i\in\{1,2,\ldots,k\}$ , let $E_{i}$ be the set of edges of $\pi_{i}$ in ${\mathcal{N}}$ . Furthermore, for each $i\in\{1,2,\ldots,p\}$ , let $E_{i}^{\prime}$ be the subset

[TABLE]

of edges in ${\mathcal{N}}_{2}$ if $\pi_{i}=\pi_{i}^{\prime}$ , and the subset

[TABLE]

of edges in ${\mathcal{N}}_{2}$ if $\pi_{i}=\pi_{i}^{\prime\prime}$ , where $\pi_{i}^{\prime}$ or $\pi_{i}^{\prime\prime}$ are as described in the construction of ${\mathcal{N}}_{2}$ from ${\mathcal{N}}$ . Clearly, there is a phylogenetic tree ${\mathcal{T}}_{p}$ with leaf set $\{t_{i},t_{i}^{\prime},t_{i}^{\prime\prime}:i\in\{1,2,\ldots,p\}\}$ for which there exists an embedding in ${\mathcal{N}}_{2}$ that contains all edges in $E_{1}^{\prime}\cup E_{2}^{\prime}\cup\cdots\cup E_{p}^{\prime}$ . Observe that ${\mathcal{T}}_{p}$ can be obtained from the caterpillar $(\ell_{1},\ell_{2},\ldots,\ell_{p})$ by replacing each $\ell_{i}\in\{\ell_{1},\ell_{2},\ldots,\ell_{p}\}$ with the caterpillar $(t_{i},t_{i}^{\prime},t_{i}^{\prime\prime})$ if $\pi_{i}=\pi_{i}^{\prime}$ and with the caterpillar $(t_{i},t_{i}^{\prime\prime},t_{i}^{\prime})$ if $\pi_{i}=\pi_{i}^{\prime\prime}$ . By construction, it now follows that ${\mathcal{N}}_{1}$ displays ${\mathcal{T}}_{p}$ . Let ${\mathcal{T}}$ be the unique phylogenetic $X^{\prime}$ -tree that is displayed by ${\mathcal{N}}_{1}$ such that ${\mathcal{T}}|\{t_{i},t_{i}^{\prime},t_{i}^{\prime\prime}:i\in\{1,2,\ldots,p\}\}={\mathcal{T}}_{p}$ . We complete the argument by showing that ${\mathcal{T}}$ is not displayed by ${\mathcal{N}}_{2}$ . Towards a contradiction, assume that ${\mathcal{T}}$ is displayed by ${\mathcal{N}}_{2}$ . Let $E_{\mathcal{T}}^{\prime}$ be an embedding of ${\mathcal{T}}$ in ${\mathcal{N}}_{2}$ . Then, since ${\mathcal{T}}$ contains $(t_{i},t_{i}^{\prime},t_{i}^{\prime\prime})$ or $(t_{i},t_{i}^{\prime\prime},t_{i}^{\prime})$ for each $i\in\{1,2,\ldots,p\}$ and ${\mathcal{N}}$ satisfies the two-path property relative to $p$ , it follows from the construction of ${\mathcal{N}}_{2}$ that $E_{\mathcal{T}}^{\prime}$ contains all edges in $E_{1}^{\prime}\cup E_{2}^{\prime}\cup\cdots\cup E_{p}^{\prime}$ . Furthermore, observe that there is a unique directed path from the root, say $\rho$ , of ${\mathcal{N}}_{2}$ to $t_{0}$ , and so the edges on this path are elements of $E_{\mathcal{T}}^{\prime}$ . For each pair $i$ and $i^{\prime}$ of distinct elements in $\{1,2,\ldots,k\}$ , it therefore follows that the directed path from $\rho$ to $t_{i}$ in $E_{\mathcal{T}}^{\prime}$ and the directed path from $\rho$ to $t_{i^{\prime}}$ in $E_{\mathcal{T}}^{\prime}$ only intersect in vertices that are ancestors of $t_{0}$ in ${\mathcal{N}}_{2}$ . Hence, as ${\mathcal{N}}_{2}$ is caterpillar-inducing with respect to $S$ , there exist directed paths $\pi_{1}^{*},\pi_{2}^{*},\ldots,\pi_{p}^{*},\pi_{p+1}^{*},\ldots,\pi_{k}^{*}$ in $E_{\mathcal{T}}^{\prime}$ such that the following three properties are fulfilled.

(i)

For each $i\in\{1,2,\ldots,p\}$ , $\pi_{i}^{*}$ is the unique directed path from $s_{i}$ to $t_{i}$ in ${\mathcal{N}}_{2}$ that contains $v_{i}^{\prime}$ if $\pi_{i}=\pi_{i}^{\prime}$ and that contains $v_{i}^{\prime\prime}$ if $\pi_{i}=\pi_{i}^{\prime\prime}$ . 2. (ii)

For each $i\in\{p+1,p+2,\ldots,k\}$ , $\pi_{i}^{*}$ is a directed path from $s_{i}$ to $t_{i}$ in ${\mathcal{N}}_{2}$ . 3. (iii)

The elements in $\Pi^{*}=\{\pi_{1}^{*},\pi_{2}^{*},\ldots,\pi_{k}^{*},\}$ are mutually vertex disjoint.

Now, by construction, observe that $\pi_{i}^{*}$ is also a directed path from $s_{i}$ to $t_{i}$ in ${\mathcal{N}}$ for each $i\in\{p+1,p+2,\ldots,k\}$ . As $\Pi^{*}$ is a set of mutually vertex-disjoint directed paths in ${\mathcal{N}}_{2}$ , it now follows that, $\Pi^{\forall}\cup\{\pi_{p+1}^{*},\pi_{p+2}^{*},\ldots,\pi_{k}^{*}\}$ is a set of mutually vertex-disjoint directed paths in ${\mathcal{N}}$ . In turn, this implies that $I$ is a yes-instance; a contradiction. Hence, ${\mathcal{T}}\notin T({\mathcal{N}}_{2})$ , and so $T({\mathcal{N}}_{1})\nsubseteq T({\mathcal{N}}_{2})$ . ∎

This establishes Theorem 4.3. ∎

We end this section with a brief discussion of the structural properties of the phylogenetic network ${\mathcal{N}}_{1}$ that is constructed in the proof of Theorem 4.3. These properties will play an important role in the next section when we establish $\Pi_{2}^{P}$ -completeness of Display-Set-Equivalence. Let ${\mathcal{N}}$ be a phylogenetic network on $X$ . We say that ${\mathcal{N}}$ is a caterpillar network if it can be obtained from a caterpillar $(\ell_{1},\ell_{2},\ldots,\ell_{k})$ with $2\leq k\leq|X|$ by replacing each $\ell_{i}$ with a phylogenetic network ${\mathcal{N}}_{i}$ on $X_{i}$ such that the elements in $\{{\mathcal{N}}_{1},{\mathcal{N}}_{2},\ldots,{\mathcal{N}}_{k}\}$ are pairwise vertex disjoint and

[TABLE]

By construction, ${\mathcal{N}}_{1}$ is a caterpillar network. Moreover, it is easily seen that ${\mathcal{N}}_{1}$ is temporal and tree-child.

The next corollary now immediately follows from Theorem 4.3.

Corollary 4.4.

Let ${\mathcal{N}}_{1}$ be a temporal tree-child caterpillar network on $X$ , and let ${\mathcal{N}}_{2}$ be a phylogenetic network on $X$ . Then deciding whether $T({\mathcal{N}}_{1})\subseteq T({\mathcal{N}}_{2})$ is $\Pi_{2}^{P}$ -complete.

4.3. Display-Set-Equivalence is $\Pi_{2}^{P}$ -complete

With the result of Corollary 4.4 in hand, we are now in a position to establish the main result of Section 4 which is the following theorem.

Theorem 4.5.

Display-Set-Equivalence* is $\Pi_{2}^{P}$ -complete.*

Proof.

Let ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ be two phylogenetic networks on $X$ . By Theorem 4.3, the problem of deciding whether or not $T({\mathcal{N}})\subseteq T({\mathcal{N}}^{\prime})$ is in $\Pi_{2}^{P}$ . Similarly, the problem of deciding whether or not $T({\mathcal{N}}^{\prime})\subseteq T({\mathcal{N}})$ is in $\Pi_{2}^{P}$ . Hence, Display-Set-Equivalence is in $\Pi_{2}^{P}$ .

We next establish a polynomial-time reduction from Display-Set-Containment to Display-Set-Equivalence. Let ${\mathcal{N}}_{1}$ and ${\mathcal{N}}_{2}$ be two phylogenetic networks on $X=\{\ell_{1},\ell_{2},\ldots,\ell_{n}\}$ that form the input to an instance of Display-Set-Containment that asks if $T({\mathcal{N}}_{1})\subseteq T({\mathcal{N}}_{2})$ . By Corollary 4.4, we may assume that ${\mathcal{N}}_{1}$ is a caterpillar network. Then there exist two vertex-disjoint phylogenetic networks ${\mathcal{M}}_{1}$ and ${\mathcal{M}}_{1^{\prime}}$ with leaf sets $W_{1}$ and $W_{1^{\prime}}$ , respectively, such that $W_{1}\cup W_{1^{\prime}}=X$ , and ${\mathcal{N}}_{1}$ can be obtained from the caterpillar $\{x_{1},x_{2}\}$ by replacing $x_{1}$ with ${\mathcal{M}}_{1}$ and $x_{2}$ with ${\mathcal{M}}_{1^{\prime}}$ . To ease reading, let ${\mathcal{N}}_{1}^{\prime}$ and ${\mathcal{N}}_{2}^{\prime}$ be the two phylogenetic networks on $X^{\prime}=\{\ell_{1}^{\prime},\ell_{2}^{\prime},\ldots,\ell_{n}^{\prime}\}$ that are obtained from ${\mathcal{N}}_{1}$ and ${\mathcal{N}}_{2}$ , respectively, by replacing $\ell_{i}$ with $\ell_{i}^{\prime}$ in both networks for each $i\in\{1,2,\ldots,n\}$ . Similarly, let ${\mathcal{M}}_{1}^{\prime}$ and ${\mathcal{M}}_{1^{\prime}}^{\prime}$ be the two phylogenetic networks obtained from ${\mathcal{M}}_{1}$ and ${\mathcal{M}}_{1^{\prime}}$ , respectively, by replacing $\ell_{i}$ with $\ell_{i}^{\prime}$ in exactly one of ${\mathcal{M}}_{1}$ and ${\mathcal{M}}_{1^{\prime}}$ for each $i\in\{1,2,\ldots,n\}$ . If $W_{1}^{\prime}$ (resp. $W_{1^{\prime}}^{\prime}$ ) denotes the leaf set of ${\mathcal{M}}_{1}^{\prime}$ (resp. ${\mathcal{M}}_{1^{\prime}}^{\prime}$ ), then $W_{1}^{\prime}\cup W_{1^{\prime}}^{\prime}=X^{\prime}$ .

Set ${\mathcal{T}}$ as well as ${\mathcal{T}}^{\prime}$ to be the caterpillar $(w_{1},w_{2},\ldots,w_{2n+3})$ . Furthermore, let $u_{2n+3},u_{2n+2},\ldots,u_{2}$ be the directed path in ${\mathcal{T}}$ (and ${\mathcal{T}}^{\prime}$ ) such that, for all $j\in\{2,3,\ldots,2n+3\}$ , $u_{j}$ is the parent of $w_{j}$ . Now, let $G_{1}^{*}$ and $G_{2}^{*}$ be the two directed acyclic graphs that are obtained from ${\mathcal{T}}$ and ${\mathcal{T}}^{\prime}$ , respectively, by applying the following six-step process.

(1)

For all $j\in\{1,2,\ldots,n\}$ , replace $w_{j}$ with ${\mathcal{N}}_{2}$ in ${\mathcal{T}}$ and ${\mathcal{T}}^{\prime}$ by identifying $w_{j}$ with the root of ${\mathcal{N}}_{2}$ . 2. (2)

Replace $w_{n+1}$ with the root of ${\mathcal{N}}_{1}$ in ${\mathcal{T}}$ by identifying $w_{n+1}$ with the root of ${\mathcal{N}}_{1}$ , and replace $w_{n+1}$ with the root of ${\mathcal{M}}_{1}$ in ${\mathcal{T}}^{\prime}$ by identifying $w_{n+1}$ with the root of ${\mathcal{M}}_{1}$ 3. (3)

Replace $w_{n+2}$ with ${\mathcal{M}}_{1}^{\prime}$ in ${\mathcal{T}}$ by identifying $w_{n+2}$ with the root of ${\mathcal{M}}_{1}^{\prime}$ , and replace $w_{n+2}$ with ${\mathcal{M}}_{1^{\prime}}$ in ${\mathcal{T}}^{\prime}$ by identifying $w_{n+2}$ with the root of ${\mathcal{M}}_{1^{\prime}}$ 4. (4)

Replace $w_{n+3}$ with ${\mathcal{M}}_{1^{\prime}}^{\prime}$ in ${\mathcal{T}}$ by identifying $w_{n+3}$ with the root of ${\mathcal{M}}_{1^{\prime}}^{\prime}$ , and replace $w_{n+3}$ with ${\mathcal{N}}_{1}^{\prime}$ in ${\mathcal{T}}^{\prime}$ by identifying $w_{n+3}$ with the root of ${\mathcal{N}}_{1}^{\prime}$ . 5. (5)

For all $j\in\{n+4,n+5,\ldots,2n+3\}$ , replace $w_{j}$ with ${\mathcal{N}}_{2}^{\prime}$ in ${\mathcal{T}}$ and ${\mathcal{T}}^{\prime}$ by identifying $w_{j}$ with the root of ${\mathcal{N}}_{2}^{\prime}$ . 6. (6)

For each $i\in\{1,2,\ldots,n\}$ , identify all leaves labeled $\ell_{i}$ (resp. $\ell_{i}^{\prime}$ ) in ${\mathcal{T}}$ with a new vertex $v_{i}$ (resp. $v_{i}^{\prime}$ ), add a new edge $(v_{i},\ell_{i})$ (resp. $(v_{i}^{\prime},\ell_{i}^{\prime})$ ). Do the same for all leaves labeled $\ell_{i}$ (resp. $\ell_{i}^{\prime}$ ) in ${\mathcal{T}}^{\prime}$ .

To complete the construction, let ${\mathcal{N}}_{1}^{*}$ and ${\mathcal{N}}_{2}^{*}$ be two phylogenetic networks such that $G_{1}^{*}$ and $G_{2}^{*}$ can be obtained from ${\mathcal{N}}_{1}^{*}$ and ${\mathcal{N}}_{2}^{*}$ , respectively, by contracting edges. Clearly, the leaf set of ${\mathcal{N}}_{1}^{*}$ and ${\mathcal{N}}_{2}^{*}$ is $X\cup X^{\prime}$ . Moreover, the directed path $u_{2n+3},u_{2n+2},\ldots,u_{2}$ of ${\mathcal{T}}$ and ${\mathcal{T}}^{\prime}$ is also a directed path of ${\mathcal{N}}_{1}^{*}$ and ${\mathcal{N}}_{2}^{*}$ . We refer to this path as the backbone of ${\mathcal{N}}_{1}^{*}$ and ${\mathcal{N}}_{2}^{*}$ . The phylogenetic networks ${\mathcal{N}}_{1}^{*}$ and ${\mathcal{N}}_{2}^{*}$ are shown in Figures 7 and 8, respectively. Lastly, observe that the size of both ${\mathcal{N}}_{1}^{*}$ and ${\mathcal{N}}_{2}^{*}$ is $O(n(|E_{1}|+|E_{2}|))$ , where $E_{1}$ and $E_{2}$ is the edge set of ${\mathcal{N}}_{1}$ and ${\mathcal{N}}_{2}$ , respectively. Hence, the construction of ${\mathcal{N}}_{1}^{*}$ and ${\mathcal{N}}_{2}^{*}$ takes polynomial time.

4.5.1.

$T({\mathcal{N}}_{1})\subseteq T({\mathcal{N}}_{2})$ * if and only if $T({\mathcal{N}}_{1}^{*})=T({\mathcal{N}}_{2}^{*})$ .*

Proof.

Throughout this proof, let $U=\{u_{2},u_{3},\ldots,u_{2n+3}\}$ be the vertex set of the backbone of ${\mathcal{N}}_{1}^{*}$ and ${\mathcal{N}}_{2}^{*}$ , and let

[TABLE]

be the set of edges in ${\mathcal{N}}_{1}^{*}$ and ${\mathcal{N}}_{2}^{*}$ that are directed from a vertex in $U$ to a vertex not in $U$ . Furthermore, for a vertex $v$ and an embedding $E$ , we say that $v$ is in $E$ if there exists an edge in $E$ that is incident with $v$ . If $v$ is in $E$ , then we denote this by $v\in E$ .

First, suppose that $T({\mathcal{N}}_{1})\nsubseteq T({\mathcal{N}}_{2})$ . Let ${\mathcal{T}}_{1}$ be a phylogenetic $X$ -tree such that ${\mathcal{T}}_{1}\in T({\mathcal{N}}_{1})$ and ${\mathcal{T}}_{1}\notin T({\mathcal{N}}_{2})$ . Let ${\mathcal{T}}_{1}^{\prime}$ be the phylogenetic $X^{\prime}$ -tree obtained from ${\mathcal{T}}_{1}$ by replacing $\ell_{i}$ with $\ell_{i}^{\prime}$ for each $i\in\{1,2,\ldots,n\}$ . Furthermore, let ${\mathcal{T}}$ be the phylogenetic $(X\cup X^{\prime})$ -tree obtained from ${\mathcal{T}}_{1}$ and ${\mathcal{T}}_{1}^{\prime}$ by creating a new vertex $\rho$ , adding an edge that joins $\rho$ with the root of ${\mathcal{T}}_{1}$ , and adding an edge that joins $\rho$ with the root of ${\mathcal{T}}_{1}^{\prime}$ . As ${\mathcal{N}}_{1}$ displays ${\mathcal{T}}_{1}$ and ${\mathcal{N}}_{1}^{\prime}$ displays ${\mathcal{T}}_{1}^{\prime}$ , it is easy to check that an embedding of ${\mathcal{T}}$ in ${\mathcal{N}}_{2}^{*}$ can be obtained from adding edges of ${\mathcal{N}}_{2}^{*}$ to

[TABLE]

such that each element in $X$ is a descendant of $u_{n+2}$ , each element in $X^{\prime}$ is a descendant of $w_{n+3}$ . Hence, ${\mathcal{T}}$ is displayed by ${\mathcal{N}}_{2}^{*}$ .

We next show that ${\mathcal{T}}$ is not displayed by ${\mathcal{N}}_{1}^{*}$ . Towards a contradiction, assume that ${\mathcal{T}}$ is displayed by ${\mathcal{N}}_{1}^{*}$ . Let $E_{1}$ be an embedding of ${\mathcal{T}}$ in ${\mathcal{N}}_{1}^{*}$ . Furthermore, let $k$ be the maximum element in $\{1,2,\ldots,2n+3\}$ such that $w_{k}\in E_{1}$ . By construction of ${\mathcal{T}}$ , either each element in $X$ is a descendant of $w_{k}$ in $E_{1}$ or each element in $X^{\prime}$ is a descendant of $w_{k}$ in $E_{1}$ . Thus, as ${\mathcal{N}}_{2}$ does not display ${\mathcal{T}}_{1}$ and ${\mathcal{N}}_{2}^{\prime}$ does not display ${\mathcal{T}}_{1}^{\prime}$ , we have $k=n+1$ . In particular, each element in $X$ is a descendant of $w_{k}$ in $E_{1}$ . But no element in $X^{\prime}$ is a descendant of $u_{k}$ in $E_{1}$ ; a contradiction. Hence, ${\mathcal{T}}$ is not displayed by ${\mathcal{N}}^{*}_{1}$ , and so $T({\mathcal{N}}^{*}_{1})\neq T({\mathcal{N}}^{*}_{2})$ .

Second, suppose that $T({\mathcal{N}}_{1})\subseteq T({\mathcal{N}}_{2})$ . Let ${\mathcal{T}}$ be a phylogenetic $(X\cup X^{\prime})$ -tree that is displayed by ${\mathcal{N}}_{1}^{*}$ , and let $E_{1}$ be an embedding of ${\mathcal{T}}$ in ${\mathcal{N}}_{1}^{*}$ . For each $j\in\{1,2,\ldots,2n+3\}$ with $w_{j}\in E_{1}$ , let $Y_{j}$ be the set that consists of all leaves that are descendants of $w_{j}$ in $E_{1}$ , and let ${\mathcal{T}}_{j}$ be the phylogenetic tree obtained from the minimal rooted subtree of $E_{1}$ that connects all leaves in $Y_{j}$ by suppressing all vertices with in-degree one and out-degree one. If $w_{n+1}\in E_{1}$ , then, by the pigeonhole principle, there exists an element $j\in\{1,2,\ldots,n\}$ such that $w_{j}\notin E_{1}$ . Similarly, if $w_{n+3}\in E_{1}$ , then there exists an element $j^{\prime}\in\{n+4,n+5\ldots,2n+3\}$ such that $w_{j^{\prime}}\notin E_{1}$ . Without loss of generality, we may therefore assume by the construction of ${\mathcal{N}}_{1}^{*}$ that $E_{1}$ satisfies the following property.

(P) If $w_{n+1}\in E_{1}$ , then $w_{n}\notin E_{1}$ and, if $w_{n+3}\in E_{1}$ , then $w_{n+4}\notin E_{1}$ .

Recall that each tree in $T({\mathcal{N}}_{1})$ is displayed by ${\mathcal{N}}_{2}$ , each tree in $T({\mathcal{M}}_{1}^{\prime})$ is displayed by ${\mathcal{N}}_{1}^{\prime}$ , and each tree in $T({\mathcal{M}}_{1^{\prime}}^{\prime})$ is displayed by ${\mathcal{N}}_{2}^{\prime}$ . Hence, there exists a set $E_{2}$ of edges of ${\mathcal{N}}_{2}^{*}$ such that the following conditions are satisfied.

(i)

For each $j\in\{1,2,\ldots,n,n+4,n+5,\ldots,2n+3\}$ , if $w_{j}\in E_{1}$ , then $w_{j}$ is the root of a subtree in $E_{2}$ that is a subdivision of ${\mathcal{T}}_{j}$ . 2. (ii)

If $w_{n+1}\in E_{1}$ , then $w_{n}$ is the root of a subtree in $E_{2}$ that is a subdivision of ${\mathcal{T}}_{n+1}$ . 3. (iii)

If $w_{n+3}\in E_{1}$ , then $w_{n+4}$ is the root of a subtree in $E_{2}$ that is a subdivision of ${\mathcal{T}}_{n+3}$ . 4. (iv)

If $w_{n+2}\in E_{1}$ , then $w_{n+3}$ is the root of a subtree in $E_{2}$ that is a subdivision of ${\mathcal{T}}_{n+2}$ .

Since $E_{1}$ satisfies (P), $E_{2}$ is well defined. Moreover, as ${\mathcal{T}}$ is displayed by ${\mathcal{N}}_{1}^{*}$ , it now follows that there exists an embedding of ${\mathcal{T}}$ in ${\mathcal{N}}_{2}^{*}$ that contains all edges in $E_{2}$ . Thus $T({\mathcal{N}}_{1}^{*})\subseteq T({\mathcal{N}}_{2}^{*})$ .

Now, let ${\mathcal{T}}$ be a phylogenetic $(X\cup X^{\prime})$ -tree that is displayed by ${\mathcal{N}}_{2}^{*}$ . To see that ${\mathcal{T}}$ is displayed by ${\mathcal{N}}_{1}^{*}$ , we can use the same argument as the one to show that $T({\mathcal{N}}_{1}^{*})\subseteq T({\mathcal{N}}_{2}^{*})$ even thought the assumption that $T({\mathcal{N}}_{1})\subseteq T({\mathcal{N}}_{2})$ is not symmetric. In particular, we interchange the roles of ${\mathcal{N}}_{1}^{*}$ and ${\mathcal{N}}_{2}^{*}$ (and, consequently, the roles of $E_{1}$ and $E_{2}$ ). Moreover, as each tree in $T({\mathcal{M}}_{1})$ is displayed by ${\mathcal{N}}_{2}$ , each tree in $T({\mathcal{M}}_{1^{\prime}})$ is displayed by ${\mathcal{N}}_{1}$ , and each tree in $T({\mathcal{N}}_{1}^{\prime})$ is displayed by ${\mathcal{N}}_{2}^{\prime}$ , only Condition (iv) above needs to be rewritten as follows.

(iv*)

If $w_{n+2}\in E_{2}$ , then $w_{n+1}$ is the root of a subtree in $E_{1}$ that is a subdivision of ${\mathcal{T}}_{n+2}$ .

It is now straightforward to check that ${\mathcal{T}}$ is displayed by ${\mathcal{N}}_{1}^{*}$ , and so $T({\mathcal{N}}_{2}^{*})\subseteq T({\mathcal{N}}_{1}^{*})$ . Combining both cases establishes that $T({\mathcal{N}}_{2}^{*})=T({\mathcal{N}}_{1}^{*})$ . ∎

This completes the proof of Theorem 4.5. ∎

5. Conclusion

We end this paper, with three corollaries that are implied by the results presented in Section 3 and an open problem.

For two temporal networks ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ on $X$ , the authors of [9] showed that counting the number of elements in $T({\mathcal{N}})\cap T({\mathcal{N}}^{\prime})$ is #P-complete. Since Common-Tree-Containment is the decision version of computing $|T({\mathcal{N}})\cap T({\mathcal{N}}^{\prime})|$ and computational hardness of a decision problem implies computational hardness of the associated counting problem, the next corollary follows from Theorem 3.2.

Corollary 5.1.

Let ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ be two temporal normal networks on $X$ . Then counting the number of elements in ${\mathcal{T}}({\mathcal{N}})\cap{\mathcal{T}}({\mathcal{N}}^{\prime})$ is #P-complete.

In 2015, Francis and Steel [4] introduced tree-based networks. A phylogenetic network ${\mathcal{N}}$ on $X$ is tree-based if, up to suppressing vertices of in-degree one and out-degree one, ${\mathcal{N}}$ displays a phylogenetic $X$ -tree ${\mathcal{T}}$ that can be obtained by only deleting reticulation edges, in which case, ${\mathcal{T}}$ is a base tree of ${\mathcal{N}}$ . If ${\mathcal{N}}$ is tree-based, it is well known that not every phylogenetic $X$ -tree displayed by ${\mathcal{N}}$ is a base tree. However, noting that each tree-child network is also a tree-based network, it is shown in [13] that a phylogenetic tree ${\mathcal{T}}$ is displayed by a tree-child network ${\mathcal{N}}$ if and only if ${\mathcal{T}}$ is a base tree of ${\mathcal{N}}$ . Hence, for two tree-child networks ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ , the problem of deciding whether or not $T({\mathcal{N}})\cap T({\mathcal{N}}^{\prime})\neq\emptyset$ is equivalent to deciding whether or not ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ have a common base tree.

Corollary 5.2.

Let ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ be two tree-based networks on $X$ . Then deciding if ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ have a common base tree is NP-complete.

Proof.

Let $S$ be a switching of ${\mathcal{N}}$ , and let ${\mathcal{T}}$ be a phylogenetic $X$ -tree. We say that $S$ is a base-tree switching if, for each non-leaf vertex $u$ in ${\mathcal{N}}$ that is the parent of only reticulations, there exists an edge $(u,v)$ in $S$ . By the definition of a tree-based network it follows that ${\mathcal{T}}$ is a base tree of ${\mathcal{N}}$ if and only if there exists a base-tree switching $S$ of ${\mathcal{N}}$ that yields ${\mathcal{T}}$ . Now, let $S$ be a switching of ${\mathcal{N}}$ , and let $S^{\prime}$ be a switching of ${\mathcal{N}}^{\prime}$ . If $S$ is a base-tree switching of ${\mathcal{N}}$ and $S^{\prime}$ is a base-tree switching of ${\mathcal{N}}^{\prime}$ , and $S$ and $S^{\prime}$ yield the same tree, then ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ have a common base tree. Since it can be checked in polynomial time if $S$ (resp. $S^{\prime}$ ) is a base-tree switching of ${\mathcal{N}}$ (resp. ${\mathcal{N}}^{\prime}$ ), and if $S$ and $S^{\prime}$ yield the same tree, it follows that deciding whether or not ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ have a common base tree is in NP. The corollary now follows from Theorem 3.2. ∎

Lastly, using (ordinary) switchings instead of base-tree switching, ideas analogous to the ones described in the proof of Corollary 5.2 can be used to show that Common-Tree-Containment is in NP for two arbitrary phylogenetic networks. The next corollary is now an immediate consequence of Theorem 3.2.

Corollary 5.3.

Common-Tree-Containment* is NP-complete for two arbitrary phylogenetic networks.*

Now, let $C$ be a class of phylogenetic networks for which Tree-Containment is solvable in polynomial time such as tree-child or, more generally, reticulation-visible networks [1, 6, 15]. Furthermore, let ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ be two networks in $C$ . Then deciding if ${\mathcal{T}}({\mathcal{N}})=T({\mathcal{N}}^{\prime})$ is in co-NP because, given a tree ${\mathcal{T}}$ that is displayed by ${\mathcal{N}}$ or ${\mathcal{N}}^{\prime}$ , it can be checked in polynomial time, if ${\mathcal{T}}$ is also displayed by the other network. If this is not the case, then ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ form a no-instance of Display-Set-Equivalence. Whether Display-Set-Equivalence for ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ is co-NP-complete remains an open problem. Nevertheless, it is unlikely that Display-Set-Equivalence for ${\mathcal{N}}$ and ${\mathcal{N}}^{\prime}$ is $\Pi_{2}^{P}$ -complete since a problem that is $\Pi_{2}^{P}$ -complete and in co-NP would imply that co-NP= $\Pi_{2}^{P}$ which, in turn, would result in a collapse of the polynomial hierarchy to the first level.

Bibliography18

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Bordewich and C. Semple , Reticulation-visible networks, Advances in Applied Mathematics, 76 (2016), pp. 114–141.
2[2] G. Cardona, F. Rosselló, and G. Valiente , Comparison of tree-child phylogenetic networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6 (2009), pp. 552–569.
3[3] J. Döcker, S. Linz, and C. Semple , Display sets of normal and tree-child networks, submitted.
4[4] A. Francis and M. Steel , Which phylogenetic networks are merely trees with additional arcs? Systematic Biology, 64 (2015), pp. 768–777.
5[5] M. R. Garey and D. S. Johnson , Computers and intractability: a guide to the theory of NP-completeness, W. H. Freeman and Company, 1979.
6[6] A. D. M. Gunawan, B. Das Gupta, and L. Zhang , A decomposition theorem and two algorithms for reticulation-visible networks, Information and Computation, 252 (2017), pp. 161–175.
7[7] I. A. Kanj, L. Nakhleh, C. Than, and G. Xia , Seeing the trees and their branches in the network is hard, Theoretical Computer Science, 401 (2008), pp. 153–164.
8[8] S. Khuller , Design and analysis of algorithms: course notes, Available at https://drum.lib.umd.edu/bitstream/handle/1903/592/CS-TR-3113.ps?sequence=1 , 1994

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Displaying trees across two phylogenetic networks

Abstract.

Key words and phrases:

1. Introduction

2. Preliminaries

Lemma 2.1**.**

Observation 2.2**.**

3. Hardness of Common-Tree-Containment

Lemma 3.1**.**

Proof.

Theorem 3.2**.**

Proof.

3.2.1**.**

Proof.

3.2.2**.**

Proof.

Corollary 3.3**.**

4. Hardness of Display-Set-Equivalence

4.1. ∀∃\forall\exists∀∃ Directed-Disjoint-Connecting-Paths is Π2P\Pi_{2}^{P}Π2P​-complete

Theorem 4.1**.**

Proof.

4.1.1**.**

Proof.

Corollary 4.2**.**

Proof.

4.2. Display-Set-Containment is Π2P\Pi_{2}^{P}Π2P​-complete

Theorem 4.3**.**

Proof.

4.3.1**.**

Proof.

Corollary 4.4**.**

4.3. Display-Set-Equivalence is Π2P\Pi_{2}^{P}Π2P​-complete

Theorem 4.5**.**

Proof.

4.5.1**.**

Proof.

5. Conclusion

Corollary 5.1**.**

Corollary 5.2**.**

Proof.

Corollary 5.3**.**

Lemma 2.1.

Observation 2.2.

Lemma 3.1.

Theorem 3.2.

3.2.1.

3.2.2.

Corollary 3.3.

4.1. $\forall\exists$ Directed-Disjoint-Connecting-Paths is $\Pi_{2}^{P}$ -complete

Theorem 4.1.

4.1.1.

Corollary 4.2.

4.2. Display-Set-Containment is $\Pi_{2}^{P}$ -complete

Theorem 4.3.

4.3.1.

Corollary 4.4.

4.3. Display-Set-Equivalence is $\Pi_{2}^{P}$ -complete

Theorem 4.5.

4.5.1.

Corollary 5.1.

Corollary 5.2.

Corollary 5.3.