Rearrangement operations on unrooted phylogenetic networks
Remie Janssen, Jonathan Klawitter

TL;DR
This paper explores the properties of spaces of unrooted phylogenetic networks under rearrangement operations like NNI, SPR, and TBR, including connectivity, diameter bounds, and computational complexity of distance measures.
Contribution
It extends known rearrangement operations from trees to networks, analyzing their properties and computational complexity in this broader context.
Findings
Proved connectedness of network spaces under these operations
Established asymptotic bounds on the diameters of network spaces
Showed computing TBR and PR distances is NP-hard
Abstract
Rearrangement operations transform a phylogenetic tree into another one and hence induce a metric on the space of phylogenetic trees. Popular operations for unrooted phylogenetic trees are NNI (nearest neighbour interchange), SPR (subtree prune and regraft), and TBR (tree bisection and reconnection). Recently, these operations have been extended to unrooted phylogenetic networks, which are generalisations of phylogenetic trees that can model reticulated evolutionary relationships. Here, we study global and local properties of spaces of phylogenetic networks under these three operations. In particular, we prove connectedness and asymptotic bounds on the diameters of spaces of different classes of phylogenetic networks, including tree-based and level-k networks. We also examine the behaviour of shortest TBR-sequence between two phylogenetic networks in a class, and whether the…
| class | NNI | PR | TBR |
|---|---|---|---|
| [LTZ96] | [DGH11] | [DGH11] | |
| T. 5.2 | [FHM18, JJE+18] | T. 5.6 | |
| ✓Corollary 5.3 | ✓Corollary 5.4 | ✓Corollary 5.4 | |
| ✓Proposition 5.7 | ✓Proposition 5.7 | ✓Proposition 5.7 | |
| Theorem 5.9 | Theorem 5.9 | Theorem 5.9 | |
| T. 5.10 | Theorem 5.10 | T. 5.10 | |
| ✓Theorem 5.10 | ✓Theorem 5.10 | ✓Theorem 5.10 | |
| ✓Theorem 5.10 | ✓Theorem 5.10 | ✓Theorem 5.10 | |
| ✓Theorem 5.12 | ✓Theorem 5.11 | ✓Theorem 5.11 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Rearrangement operations on unrooted phylogenetic networks
Remie Janssen [
](https://orcid.org/0000-0002-5192-1470)
Delft Institute of Applied Mathematics, Delft University of Technology, Netherlands
Jonathan Klawitter [
](https://orcid.org/0000-0001-8917-5269)
School of Computer Science, University of Auckland, New Zealand
Abstract
Rearrangement operations transform a phylogenetic tree into another one and hence induce a metric on the space of phylogenetic trees. Popular operations for unrooted phylogenetic trees are NNI (nearest neighbour interchange), SPR (subtree prune and regraft), and TBR (tree bisection and reconnection). Recently, these operations have been extended to unrooted phylogenetic networks—generalisations of phylogenetic trees that can model reticulated evolutionary relationships—where they are called NNI, PR, and TBR moves. Here, we study global and local properties of spaces of phylogenetic networks under these three operations. In particular, we prove connectedness and asymptotic bounds on the diameters of spaces of different classes of phylogenetic networks, including tree-based and level- networks. We also examine the behaviour of shortest TBR-sequence between two phylogenetic networks in a class, and whether the TBR-distance changes if intermediate networks from other classes are allowed: for example, the space of phylogenetic trees is an isometric subgraph of the space of phylogenetic networks under TBR. Lastly, we show that computing the TBR-distance and the PR-distance of two phylogenetic networks is NP-hard.
\EdefEscapeHex
Abstract.1Abstract.1\EdefEscapeHexAbstractAbstract\[email protected]\hyper@anchorend
1 Introduction
Phylogenetic trees and networks are leaf-labelled graphs that are used to visualise and study the evolutionary history of taxa like species, genes, or languages. While phylogenetic trees are used to model tree-like evolutionary histories, the more general phylogenetic networks can be used for taxa whose past includes reticulate events like hybridisation or horizontal gene transfer [SS03, HRS10, Ste16]. Such reticulate events arise in all domains of life [TN05, RW07, MMM*+*17, WWK*+*17]. In some cases, it can be useful to distinguish between rooted and unrooted phylogenetic networks. In a rooted phylogenetic network, the edges are directed from a designated root towards the leaves. Hence, it models evolution along the passing of time. An unrooted phylogenetic network, on the other hand, has undirected edges and thus represent evolutionary relatedness of the taxa. In some cases, unrooted phylogenetic networks can be thought of as rooted phylogenetic networks in which the orientation of the edges has been disregarded. Such unrooted phylogenetic networks are called proper [JJE*+*18, FHM18]. Here we focus on unrooted, binary, proper phylogenetic networks, where binary means that all vertices except for the leaves have degree three. The set of phylogenetic networks on the same taxa can be partitioned into tiers that contain all networks of the same size.
A rearrangement operation transforms a phylogenetic tree into another tree by making a small graph theoretical change. An operation that works locally within the tree is the NNI (nearest neighbour interchange) operation, which changes the order of the four edges incident to an edge . See for example the NNI from to in Figure 1. Two further popular rearrangement operations are the SPR (subtree prune and regraft) operation, which as the name suggests prunes (cuts) an edge and then regrafts (attaches) the resulting half edge again, and the TBR (tree bisection and reconnection) operation, which first removes an edge and then adds a new one to reconnect the resulting two smaller trees. See, for example, the SPR from to and the TBR from to in Figure 1.
The set of phylogenetic trees on a fixed set of taxa together with a rearrangement operation yields a graph where the vertices are the trees and two trees are adjacent if they can be transformed into each other with the operation. We call this a space of phylogenetic trees. This construction also induces a metric on phylogenetic trees as the distance of two trees is then given as the distance in this space, that is, the minimum number of applications of the operation that are necessary to transform one tree into the other [SOW96]. However, computing the distance of two trees under NNI, SPR, and TBR is NP-hard [DHJ*+*97, HDRCB08, AS01]. Nevertheless, both the space of phylogenetic trees and a metric on them are of importance for the many inference methods for phylogenetic trees that rely on local search strategies [Gus14, SJ17].
Recently, these rearrangement operations have been generalised to phylogenetic networks, both for unrooted networks [HLMW16, HMW16, FHMW18] and for rooted networks[BLS17, FHMW18, GvIJ*+*17, Kla19]. For unrooted networks, Huber et al. [HLMW16] first generalised NNI to level-1 networks, which are phylogenetic networks where all cycles are vertex disjoint. This generalisation includes a horizontal move that changes the topology of the network, like an NNI on a tree, and vertical moves that add or remove a triangle to change the size of the network. Among other results, they then showed that the space of level-1 networks and its tiers are connected under NNI [HLMW16, Theorem 2]. Note that connectedness implies that the distance between any two networks in such a space is finite and that NNI thus induces a metric. This NNI operation was then extended by Huber et al. [HMW16] to work for general unrooted phylogenetic networks. Again, connectedness of the space was proven. Later, Francis et al. [FHMW18] gave lower and upper bounds on the diameter (the maximum distance) of the space of unrooted phylogenetic network of a fixed size under NNI. They also showed that SPR and TBR can straightforwardly be generalised to phylogenetic networks, that the connectedness under NNI implies connectedness under SPR and TBR, and they gave bounds on the diameters. These bounds for SPR were made asymptotically tight by Janssen et al. [JJE*+*18]. Here, we improve these bounds on the diameter under TBR.
There are several generalisations of SPR on rooted phylogenetic trees to rooted phylogenetic networks for which connectedness and diameters have been obtained [BLS17, FHMW18, GvIJ*+*17, JJE*+*18, Jan18]. For example, Bordewich et al. [BLS17] introduced SNPR (subnet prune and regraft), a generalisation of SPR that includes vertical moves, which add or remove an edge. They then proved connectedness under SNPR for the space of rooted phylogenetic networks and for special classes of phylogenetic networks including tree-based networks. Roughly speaking, these are networks that have a spanning tree that is the subdivision of a phylogenetic tree on the same taxa [FS15, FHM18]. Furthermore, Bordewich et al. [BLS17] gave several bounds on the SNPR-distance of two phylogenetic networks. Further bounds and a characterisation of the SNPR-distance of a tree and a network were recently proven by Klawitter and Linz [KL19]. Here, we show that these bounds and characterisation on the SNPR-distance of rooted phylogenetic networks are analogous to the TBR-distance of two unrooted phylogenetic networks.
In this paper, we study spaces of unrooted phylogenetic networks under NNI, PR (prune and regraft), and TBR. Here, the PR and the TBR operation are the generalisation of SPR and TBR on trees, respectively, where vertical moves add or remove an edge like the vertical moves of the SNPR operation in the rooted case. After the preliminary section, we examine the relation of NNI, PR, and TBR; in particular, how a sequence using one of these operations can be transformed into a sequence using another operation (Section 3). We then study properties of shortest paths under TBR in Section 4. This includes the translation of the results from Bordewich et al. [BLS17] and Klawitter and Linz [KL19] on the SNPR-distance of rooted phylogenetic networks to the TBR-distance of unrooted phylogenetic networks. Next, we consider the connectedness and diameters of spaces of phylogenetic networks for different classes of phylogenetic networks, including tree-based networks and level- networks (Section 5). A subspace of phylogenetic networks (e.g., the space of tree-based networks) is an isometric subgraph of a larger space of phylogenetic networks if, roughly speaking, the distance of two networks is the same in the smaller and the larger space. In Section 6 we study such isometric relations and answer a question by Francis et al. [FHMW18] by showing that the space of phylogenetic trees is an isometric subgraph of the space of phylogenetic networks under TBR. We use this result in Section 7 to show that computing the TBR-distance is NP-hard. In the same section, we also show that computing the PR-distance is NP-hard.
2 Preliminaries
This section provides notation and terminology used in the remainder of the paper. In particular, we define phylogenetic networks and special classes thereof, and rearrangement operations and how they induce distances. Throughout this paper, denotes a finite set of taxa.
Phylogenetic networks.
An unrooted, binary phylogenetic network on a set of taxa is an undirected multigraph such that the leaves are bijectively labelled with and all non-leaf vertices have degree three. It is called proper if every cut-edge separates two labelled leaves [FHM18], and improper otherwise. This property implies that every edge lies on a path that connects two leaves. More importantly, a network can be rooted at any leaf if and only if it is proper [JJE*+*18, Lemma 4.13]. If not mentioned otherwise, we assume that a phylogenetic network is proper. Furthermore, note that our definition of a phylogenetic network permits the existence of parallel edges in , i.e., we allow that two distinct edges join the same pair of vertices. An unrooted, binary phylogenetic tree on is an unrooted, binary phylogenetic network on that is a tree.
Let denote the set of all unrooted, binary proper phylogenetic networks on and let denote the set of all unrooted, binary phylogenetic trees on , where . To ease reading, we refer to an unrooted, binary proper phylogenetic network (resp. unrooted, binary phylogenetic tree) on simply as phylogenetic network or network (resp. phylogenetic tree or tree). Figure 2 shows an example of a tree , a network in , and an improper network .
An edge of a network is an external edge if it is incident to a leaf, and an internal edge otherwise. A cherry of is a pair of leaves and in that are adjacent to the same vertex. For example, each network in Figure 2 contains the cherry .
Tiers.
We say a network has reticulation number111In graph theory the value of a connected graph is also called the cyclomatic number of the graph [Die17]. for , that is, the number of edges that have to be deleted from to obtain a spanning tree of . For example, the network in Figure 2 has reticulation number three. Note that a phylogenetic tree is a phylogenetic network with reticulation number zero. Let denote tier of , the set of networks in that have reticulation number .
Embedding.
Let be an undirected graph. Subdividing an edge of consists of replacing by a path form to that contains at least one edge. A subdivision of is a graph that can be obtained from by subdividing edges of . If has no degree two vertices, there exists a canonical embedding of vertices of to vertices of and of edges of to paths of . Let . We say has an embedding into if there exists a subdivision of that is a subgraph of such that the embedding maps each labelled vertex of to a labelled vertex of with the same label.
Displaying.
Let and . We say displays if has an embedding into . For example, in Figure 2 the tree is displayed by both networks and . Let be the set of trees in that are displayed by . This notion can be extended to trees with fewer leaves, and to networks. For this, let be a phylogenetic network on . We say displays if has an embedding into . Let be a set of phylogenetic networks on . Then let denote the subset of networks in that display each network in .
Tree-based networks.
A phylogenetic network is a tree-based network if there is a tree that has an embedding into as a spanning tree. In other words, there exists a subdivision of that is a spanning tree of . The tree is then called a base tree of . Let denote the set of tree-based networks in . For , let denote the set of tree-based networks in with base tree .
Level- networks.
A blob of a network is a nontrivial two-connected component of . The level of is the minimum number of edges that have to be removed from to make it acyclic. The level of is the maximum level of all blobs of . If the level of is at most , then is called a level- network. Let denote the set of level- networks in .
-Burl.
An -burl is a specific type of blob that we define recursively: a -burl is the blob consisting of a pair of parallel edges; an -burl is the blob obtained by placing a pair of parallel edges on one of the parallel edges of an -burl for all . See for example the network in Figure 3.
-Handcuffed trees and caterpillars.
Let and let and be two leaves of . Let and be the edges incident to and , respectively. Subdivide and with vertices and , respectively, and add the edges . The resulting network is an -handcuffed tree with base tree on the handcuffed leaves . Note that has reticulation number . If the tree is a caterpillar and and form a cherry of , then the resulting network is an -handcuffed caterpillar. Furthermore, we call an -handcuffed caterpillar sorted if it is handcuffed on the leafs 1 and 2 and the leafs from 3 to have a non-decreasing distance to leaf 1. See Figure 3 for an example.
Suboperations.
To define rearrangement operations on phylogenetic networks, we first define several suboperations. Let be an undirected graph. A degree-two vertex of with adjacent vertices and gets suppressed by deleting and its incident edges, and adding the edge . The reverse of this suppression is the subdivision of with vertex .
Let be a network, and an edge of . Then gets removed by deleting from and suppressing any resulting degree-two vertices. We say gets pruned at by transforming it into the half edge and suppressing if it becomes a degree-two vertex. Note that otherwise is a leaf. In reverse, we say that a half edge gets regrafted to an edge by transforming it into the edge where is a new vertex subdividing .
TBR.
A TBR operation222The TBR operation is known on unrooted phylogenetic trees as tree bisection and reconnection. Since in general networks are not trees and a TBR on a network does not necessarily bisect it, we use TBR now as a word on its own. For the reader who would however like to have an expansion of TBR we suggest "total branch relocation". We welcome other suggestions. is the rearrangement operation that transforms a network into another network in one of the following four ways:
- (TBR0)
Remove an internal edge of , subdivide an edge of the resulting graph with a new vertex , subdivide an edge of the resulting graph with a new vertex , and add the edge ;
- or, prune an external edge of that is incident to leaf at , regraft to an edge of the resulting graph.
- (TBR+)
Subdivide an edge of with a new vertex , subdivide an edge of the resulting graph with a new vertex , and add the edge .
- (TBR-)
Remove an edge of .
Note that a TBR0 can also be seen as the operation that prunes the edge at both and and then regrafts both ends. Hence, we say that a TBR0 moves the edge . Furthermore, we say that a TBR+ adds the edge and that a TBR- removes the edge . These operations are illustrated in Figure 4. Note that a TBR0 has an inverse TBR0 and that a TBR+ has an inverse TBR-, and that furthermore a TBR+ increases the reticulation number by one and a TBR- decreases it by one.
Since a TBR operation has to yield a phylogenetic network, there are some restrictions on the edges that can be moved or removed. Firstly, if removing an edge by a TBR0 yields a disconnected graph, then in order to obtain a phylogenetic network an edge has to be added between the two connected components. Similarly, a TBR- cannot remove a cut-edge. Secondly, the suppression of a vertex when removing an edge may not yield a loop . Thirdly, removing or moving an edge cannot create a cut-edge that does not separate two leaves. Otherwise the network would not be proper.
The TBR0 operation equals the well known TBR (tree bisection and reconnection) operation on unrooted phylogenetic trees [AS01]. The TBR operation on trees has recently been generalised to TBR0 on improper unrooted phylogenetic networks by Francis et al. [FHMW18].
PR.
A PR (prune and regraft) operation is the rearrangement operation that transforms a network into another network with a PR+ TBR+, a PR- TBR-, or a PR0 that prunes and regrafts an edge only at one endpoint, instead of at both like a TBR0. Like for TBR, we the say that the PR0/+/- moves/adds/removes the edge in . The PR operation is a generalisation of the well-known SPR (subtree prune and regraft) operation on unrooted phylogenetic trees [AS01]. Like for TBR, the generalisation of SPR to PR0 for networks has been introduced by Francis et al. [FHMW18].
NNI.
An NNI (nearest neighbour interchange) operation is a rearrangement operation that transforms a network into another network in one of the following three ways:
- (NNI0)
Let be an internal edge of . Prune an edge () at , and regraft it to an edge () that is incident to .
- (NNI+)
Subdivide two adjacent edges with new vertices and , respectively, and add the edge .
- (NNI-)
If contains a triangle, remove an edge of the triangle.
These operations are illustrated in Figure 5. We say that an NNI0 moves the edge . Alternatively, we call the edge of an NNI0 the axis of the operation, as the operation can also be defined as pruning at , and at , and regrafting at and at . The NNI operation has been introduced on trees by Robinson [Rob71] and generalised to networks by Huber et al. [HLMW16, HMW16].
Sequences and distances.
Let be two networks. A TBR-sequence from to is a sequence
[TABLE]
of phylogenetic networks such that can be obtained from by a single TBR for each . The length of is . The TBR-distance between and is the length of a shortest TBR-sequence from to , or infinite if no such sequence exists.
Let be a class of phylogenetic networks. The TBR-distance on is defined like on but with the restriction that every network in a shortest TBR-sequence has to be in . The class is connected under TBR if, for all pairs , there exists a TBR-sequence from to such that each network in is in . Hence, for the TBR-distance to be a metric on , the class has to be connected under TBR and the TBR operation has to be reversible. We already noted above that the latter holds for TBR (and NNI and PR). For a connected class , the diameter is the maximum distance between two of its networks under its metric. The definition for NNI and PR are analogous.
Let be a subclass of . Then is an isometric subgraph of a under, say, TBR if for every the TBR-distance of and in equals the TBR-distance of and in .
3 Relations of rearrangement operations
On trees, it is well known that every NNI is also an SPR, which, in turn, is also a TBR. We observe that the same holds for the generalisations of these operations as defined above.
Observation 3.1**.**
Let . Then, on , every NNI is a PR and every PR is a TBR.
For the reverse direction, we first show that every TBR can be mimicked by at most two PR like in . Then we show how to substitute a PR with an NNI-sequence.
Lemma 3.2**.**
Let such that . Then , where a TBR0 may be replaced by two PR0.
Proof.
If can be obtained from by a TBR+ or TBR-, then by the definition of PR+ and PR- it follows that . If can be obtained from by a TBR0 that is also a PR0, the statement follows. Assume therefore that can be obtained from by a TBR0 that moves the edge of to of . Let be the graph obtained from by removing , or equivalently the graph obtained from by removing . If is a cut-edge, then so is , and without loss of generality and as well as and subdivide an edge in the same connected components of . Furthermore, if subdivides an edge of a pendant blob in , then so does . Otherwise would not be proper. Therefore, the PR0 that prunes at and regrafts it to obtain yields a phylogenetic network . The choices of and ensure that is connected and proper. There is then a PR0 from to that prunes at and regrafts it at to obtain . Hence, . ∎
Corollary 3.3**.**
Let . Then .
Lemma 3.4**.**
*Let such that there is a PR0 that transforms into . Let be the edge of pruned by this PR0.
Then there exists an NNI0-sequence from to that only moves and whose length is in . Moreover, if neither nor contains parallel edges, then neither does any intermediate networks in the NNI-sequence.*
Proof.
Assume that can be transformed into by pruning the edge at and regrafting it to . Note that there is then a (shortest) path from to in , since otherwise would be disconnected. Without loss of generality, assume that does not contain . Furthermore, assume for now that does not contain . The idea is now to move along to with NNI0. In particular, we show how to construct a sequence such that either can be obtained from by an NNI0 or , and such that contains the edge . This process is illustrated in Figure 6. Assume we have constructed the sequence up to . Let with be the edge incident to that is not on . Obtain from by swapping and with an NNI0 on the axis . Note that this preserves the path and that may only contain a parallel edge if or contains parallel edges. As a result, we get .
It remains to show that every network in is proper. Assume otherwise and let be the first improper network in . Then contains a cut-edge that separates a blob from all leaves. We claim that is part of . Indeed, the pruning of the NNI0 from to has to create and the regrafting cannot be to , so it has to pass along (Figure 7). However, as is a path, the moving edge cannot pass again, so all networks for including are improper; a contradiction. Hence, all intermediate networks are proper and thus is an NNI0-sequence from to .
Next, assume that contains . Then first apply the process above to move of along to . In the resulting network, apply the process above to move of along to . The process again avoids the creation of a network with parallel edges, if neither nor contains parallel edges. Furthermore, from Figure 7 we get that if would contain improper network then would be contained in the blob . However, then and would be edges from to the rest of the network; again a contradiction.
Lastly, note that the length of is in since contains only edges. Hence, the length of is also in . ∎
Lemma 3.5**.**
*Let . Let such that there is a PR- that transforms into . Let be the edge of removed by this PR-. Let have reticulation number .
Then, there is an NNI0-sequence followed by one NNI- that transforms and by only moving and removing and whose length is in . Moreover, if neither nor contains parallel edges, then neither do the intermediate networks in the NNI-sequence.*
Proof.
Assume the PR- removes from to obtain . If is part of a triangle, the PR- move is an NNI- move. If is a parallel edge, then move either or with an NNI0 to obtain a network with a triangle that contains . Then the previous case applies. So assume otherwise, namely that is not part of a triangle or a pair of parallel edges. Then move with an NNI0-sequence closer to to form a triangle as follows.
Because removing in yields the proper network , it follows that contains a shortest path from to . Since is not part of a triangle, this path must contain at least two nodes other than and . Let and be the last two edges on . Consider the PR0 that prunes at and regrafts it to . Note that this creates a triangle on the vertices , and . By Lemma 3.4 we can replace this PR0 with an NNI0-sequence. Lastly, we can remove with an NNI- to obtain . The bound on the length of the NNI-sequence as well as the second statement follow from Lemma 3.4. ∎
To conclude this section, we note that all previous results combined show that we can replace a TBR-sequence with a PR-sequence, which we can further replace with an NNI-sequence. For several connectedness results in Section 5 this allows us to focus on TBR and then derive results for NNI and PR.
4 Shortest paths
In this section, we focus on bounds on the distance between two specified networks. We restrict to the TBR-distance in and in , and study the structure of shortest sequences of moves. We make several observations about these sequences in general, and some about shortest sequences between two networks that have certain structure in common, e.g., common displayed networks. Hence, we get bounds on the TBR-distance between two networks, and we uncover properties of the spaces of phylogenetic networks which allow for reductions of the search space. For example, if and have reticulation number , no shortest path from to contains a network with a reticulation number less than . The proof of this statement relies on the following observation about the order in which TBR0 and TBR+ operations can occur in a shortest path.
Observation 4.1**.**
Let such that there exists a TBR-sequence that uses a TBR+ and a TBR-. Then there is a TBR0 that transforms into .
Rephrasing 4.1, a TBR+ followed by a TBR-, or vice versa, can be replaced by a TBR0. This case can thus not occur in a shortest TBR-sequence. Next, we look at a TBR0 followed by a TBR+.
Lemma 4.2**.**
*Let with reticulation number and such that there exists a shortest TBR-sequence that starts with a TBR0.
Then there is a TBR-sequence that starts with a TBR+.*
Proof.
Note that the TBR0 from to of can be replaced with a sequence consisting of a TBR+ followed by a TBR-. This TBR- and the TBR+ from to can now be combined to a TBR0, which gives us a sequence . ∎
Let and consider a shortest TBR-sequences from to that contains TBR+ and TBR- operations. If the reverse statement of Lemma 4.2 would also hold, then we could shuffle the sequence such that consecutive TBR+ and TBR- can be replaced with a TBR0. This would imply that is an isometric subgraph of under TBR. However, we now show that the reverse statement of Lemma 4.2 does not hold in general, and, hence, adjacent operations of different types in a shortest TBR-sequence cannot always be swapped.
Lemma 4.3**.**
*Let and . Let with reticulation number and such that there exists a shortest TBR-sequence that starts with a TBR+.
Then it is not guaranteed that there is a TBR-sequence that starts with a TBR0.*
Proof.
We claim that the networks and in Figure 8 are a pair of networks for which no TBR-sequence exists that starts with a TBR0. The two networks and in Figure 8 are the only two TBR- neighbours of . However, it is easy to check that the TBR0-distance of and , , is at least two. Hence, a shortest TBR sequence from to that starts with a TBR0 has length three and so cannot exist. Note that we can add an edge to each of the pair of parallel edges to obtain an example without parallel edges. Moreover, the example can be extended to higher and by adding extra leaves between leaf 3 and 4, and replacing a pair of parallel edges by a chain of parallel edges in each network. ∎
Note that the TBR0 used in Figure 8 to prove Lemma 4.3 is a PR0. Hence, the statement of Lemma 4.3 also holds for PR. On the positive side, if one of the two networks is a tree, then we can swap the TBR+ with the TBR0.
Lemma 4.4**.**
*Let and with reticulation number one such that there exists a shortest TBR-sequence that starts with a TBR+.
Then there is a TBR-sequence that starts with a TBR0.*
Proof.
We show how to obtain from . Suppose that is obtained from by adding the edge and that is obtained from by removing and adding . Note that is an edge of the cycle in . Furthermore, and are distinct. Indeed, otherwise there would be a shorter TBR-sequence from to that simply adds to .
Assume for now that is an edge of in . Then, can be removed with a TBR- from to obtain a tree . Hence, the TBR+ from to and the TBR- from to can be merged into a TBR0 from to . Furthermore, the edge can then be added to with a TBR+ to obtain . This yields the sequence .
Next, assume that is not an edge of in . Then, is a cut-edge in and is a cut-edge in . Let be the edge of that equals , if it exists, or the edge that gets subdivided by into and another edge. Let be the edge of defined as follows: it is equal to itself if is not touched by the TBR0 move from to ; it is the extension of if one of its endpoints is suppressed by this move; it is one of the two edges obtained by subdividing . Now let be a tree obtained by removing from . Then, there is a TBR0 from to that moves to and furthermore a TBR+ that adds to and yields . We obtain again . An example is given in Figure 9. ∎
Next, we look at shortest paths between a tree and a network. First, we show that if a network displays a tree, then there is a simple TBR--sequence from the network to the tree. Recall that is the set of trees in displayed by . This result is the unrooted analogous to Lemma 7.4 by Bordewich et al. [BLS17] on rooted phylogenetic networks.
Lemma 4.5**.**
*Let and .
Then if and only if , that is, iff there exists a TBR--sequence of length from to .*
Proof.
Note that , since a TBR can reduce the reticulation number by at most one. Furthermore, if we apply a sequence of TBR- moves on , we arrive at a tree that is displayed by . Hence, if , then .
We now use induction on to show that if . If , then and the inequality holds. Now suppose that and that the statement holds whenever a network with a reticulation number less than displays . Fix an embedding of into and colour all edges of not covered by this embedding green. Note that removing a green edge with a TBR- might result in an improper network or a loop. Therefore, we have to show that there is always at least one edge that can be removed such that the resulting graph is a phylogenetic network. For this, consider the subgraph of induced by the green edges. If contains a component consisting of a single green edge , then removing from with a TBR- yields a network . If contains a tree component , then it is easy to see that removing an external edge of from with a TBR- yields a network . Otherwise, as is proper, a component displays a tree whose external edges cover exactly the external edges of . We can then apply the same case distinction to the edges of not covered by and either directly find an edge to remove or find further trees that cover the smaller remaining components. Since is finite, we eventually find an edge to remove. The induction hypothesis then applies to . This concludes the proof. ∎
Note that the proof of Lemma 4.5 also works if is a network displayed by . Hence, we get the following corollary.
Corollary 4.6**.**
*Let and let such that is displayed by .
Then , that is, there exists a TBR--sequence of length from to .*
Lemma 4.5 and Corollary 4.6 now allow us to construct TBR-sequences between networks that go down tiers and then come up again. In fact, for rooted networks this can sometimes be necessary as Klawitter and Linz have shown [KL19, Lemma 13]. However, we now show that this is never necessary for TBR on unrooted networks.
Lemma 4.7**.**
*Let .
Then in no shortest TBR-sequence from to does a TBR- precede a TBR+.*
Proof.
Consider a minimal counterexample with such that there exists a shortest TBR-sequence from to that uses exactly one TBR- and TBR+ and that starts with this TBR-. If uses TBR0 operations between the TBR- and the TBR+, then, by Lemma 4.2, we can swap the TBR+ forward until it directly follows the TBR-. However, then we can obtain a TBR-sequence shorter than by combining the TBR- and the TBR+ into a TBR0 by 4.1; a contradiction. ∎
Combining Lemmas 4.5, 4.6 and 4.2, we easily derive the following two corollaries about short sequences that do not go down tiers before going back up again.
Corollary 4.8**.**
Let with reticulation number and , with . Then
[TABLE]
Corollary 4.9**.**
Let with reticulation number and , and . Let such that . Then
[TABLE]
Both Corollaries 4.8 and 4.9 can easily be proven by first finding a sequence that goes down to tier 0 and back up to tier , and then combining the TBR- with TBR+ into TBR0 using Lemma 4.2.
The following lemma is the unrooted analogue to Proposition 7.7 by Bordewichet al. [BLS17]. We closely follow their proof.
Lemma 4.10**.**
*Let such that . Let .
Then there exists a such that*
[TABLE]
Proof.
The proof is by induction on . If , then the statement trivially holds. Suppose that . If , then set , and we have . So assume otherwise, namely that . Note that that if has been obtained from by a TBR+, then displays . Therefore, distinguish whether has been obtained from by a TBR0 or TBR- .
Suppose that has been obtained from by a TBR0 that moves the edge of . Fix an embedding of into . Since does not display , the edge is covered by . Let be the edge of that gets mapped to the path of that covers . Let and be the subgraphs of . Note that have embeddings into and . Now, if in there exists a path from the embedding of to the embedding of that avoids , then the graph consisting of , , and is a tree displayed by . Otherwise is a cut-edge of and the TBR0 moves to an edge connecting the two components of . Then in there is path from the embedding of to the embedding of in . Together they form an embedding of a tree displayed by . In both cases can also be obtained from by moving to where attaches to and . If is obtained from by a TBR-, then the first case has to apply.
Now suppose that and that the hypothesis holds for any two networks with TBR-distance at most . Let such that and . Thus by induction there are trees and such that with and with . It follows that , thereby completing the proof of the lemma. ∎
By setting one of the two networks in the previous lemma to be a phylogenetic tree and noting that the roles of and are interchangeable, the next two corollaries are immediate consequences of Lemmas 4.5 and 4.10.
Corollary 4.11**.**
Let , such that . Then for every
[TABLE]
Corollary 4.12**.**
Let and let . Then
[TABLE]
The following theorem is the unrooted analogous of Theorem 7 by Klawitter and Linz [KL19] and their proof can be applied straightforward by swapping SNPR and rooted networks with TBR and unrooted networks, and by using Lemmas 4.5 and 4.10 and Theorem 6.1.
Theorem 4.13**.**
Let and let . Then
[TABLE]
5 Connectedness and diameters
Whereas in the previous section we studied the maximum distance between two given networks, here, we focus on global connectivity properties of several classes of phylogenetic networks under NNI, PR, and TBR. These results imply that these operations induce metrics on these spaces. For each connected metric space, we can ask about its diameter. Since a class of phylogenetic networks that contains networks with unbounded reticulation number naturally has an unbounded diameter, this questions is mainly of interest for the tiers of a class. First, we recall some known results from unrooted phylogenetic trees.
Theorem 5.1** (Li et al.[LTZ96], Ding et al.[DGH11]).**
The space is connected under
NNI0* with the diameter in ,*
PR0* with the diameter in , and*
TBR0* with the diameter in .*
5.1 Network space
Huber et al. [HMW16, Theorem 5] proved that the space of phylogenetic networks that includes improper networks is connected under NNI. We reprove this for our definition of , but first look at the tiers of this space.
Theorem 5.2**.**
*Let , , and .
Then is connected under NNI with the diameter in .*
Proof.
Let and let be a tree displayed by . We show that can be transformed into a sorted -handcuffed caterpillar with NNI. Our process is as follows and illustrated in Figure 10.
Step 1.
Transform into a network that is tree-based on .
Step 2.
Transform into handcuffed tree on the leafs 1 and 2.
Step 3.
Transform into a sorted handcuffed caterpillar .
We now describe this process in detail. For Step 1, we show how to construct an NNI0-sequence from to , and we give a bound on the length of . Let be an embedding of into , that is, is a subdivision of and a subgraph of . Colour all edges of used by black and all other edges green. Note that this yields green, connected subgraphs of ; more precisely, the are the connected components of the graph induced by the green edges of . Note that each has at least two vertices in , since otherwise would not be proper. Furthermore, if each consists of a single edge, then is tree-based on . Assuming otherwise, we show how to break the apart.
First, if there is a triangle on vertices where and are adjacent vertices in and is their neighbour in , then change the embedding of (and ) so that it takes the path instead of (see Figure 11a). Otherwise, there is an edge where is in and the other vertices adjacent to are not adjacent to . Let and be the other edges incident to . Apply an NNI0 to move to as in Figure 11b. Note that each such NNI0 decreases the number of vertices in green subgraphs and increases the number of vertices in . Furthermore, the resulting networks is clearly proper. Therefore, repeat these cases until all consist of single edges. Let the resulting graph be . Since there are at most vertices in all green subgraphs that are not in , the number of required NNI0 for Step 1 is at most
[TABLE]
In Step 2 we transform into a handcuffed tree on the leaves 1 and 2. Let be the set of green edges in , that is, the edges that are not in the embedding of into . Without loss of generality, assume that for the distance between and leaf in is at most the distance of to leaf in . The idea is to sweep along the edges of to move the towards leaf and then do the same for the towards leaf .
For an edge of , let be the path of corresponding to . Let be the edge of incident to leaf . Impose directions on the edges of towards leaf . Do the same for the edges of accordingly. This gives a partial order on the edges of with as maximum. Let be a linear extension of on the edges of .
Let be the minimum of . Let be the corresponding path in . From to along , proceed as follows.
- (i)
If there is an edge in , then swap and with an NNI0. 2. (ii)
If there is an edge in then move the endpoint of the green edge incident to onto the green edge incident to with an NNI0. 3. (iii)
Otherwise, if there is an edge in , then move beyond .
This is illustrated in Figure 12. Informally speaking, we stack onto so they can move together towards . Repeat this process for each edge in the order given by . For the last edge , ignore case (iii). Next “unpack” the stacked ’s on .
We now count the number of NNI0 needed. Firstly, each is swapped at most once with a . Secondly, each is moving to and from a green edge at most once. Furthermore, each vertex of corresponding to a vertex of is swapped at most twice. Hence, the total number of NNI0 required is at most
[TABLE]
Repeat this process for the towards leaf . Since the do not have to be swapped with , the total number of NNI0 required for this is at most
[TABLE]
Note that the resulting network may not yet be a handcuffed tree as the order of the and may be different. Hence, lastly in Step 2, to obtain sort the edges with the mergesort-like algorithm by Li et al. [LTZ96, Lemma 2]. They show that the required number of NNI0 for this is at most
[TABLE]
For Step 3, consider the path in from leaf to . If contains only one pendant subtree, then is handcuffed on the cherry . Otherwise, use NNI0 to reduce it to one pendant subtree. This takes at most NNI0. Next, transform the pendant subtree of into a caterpillar to obtain a handcuffed caterpillar, again with at most NNI0. Lastly, sort the leaves with the algorithm from Li et al. [LTZ96, Lemma 2] to obtain the sorted handcuffed caterpillar . The required number of NNI0 to get from to is at most
[TABLE]
Since we can transform any network into , it follows that is connected under NNI. Furthermore, adding Equations 1 to 5 up and multiplying the result by two shows that the diameter of under NNI0 is at most
[TABLE]
Francis et al. [FHMW18, Theorem 2] gave the lower bound on the diameter of tier of the space that allows improper networks under NNI (NNI0 without the properness condition). Their proof consists of two parts: a lower bound on the total number of networks in a tier , and upper bounds on the number of networks that can be reached from one network for each fixed number of NNI. The diameter of is at least the smallest number of moves needed for which previously mentioned upper bound is greater than the lower bound on .
Our version of NNI0 is stricter than theirs as we do not allow improper networks. Hence, the number of networks that can be reached with a fixed number of NNI0 is at most the number of networks that can be reached with the same number of NNI. Furthermore, their lower bound on is found by counting the number of Echidna networks, a class of networks only containing proper networks. Combining these two observations, we see that their lower bound for the diameter of under NNI is also a lower bound for under NNI0. ∎
From Theorem 5.2 we get the following corollary.
Corollary 5.3**.**
The space is connected under NNI with unbounded diameter.
Since, by 3.1, every NNI is also a PR and TBR, the statements in Theorem 5.2 and Corollary 5.3 also hold for PR and TBR. This observation has been made before by Francis et al.[FHMW18] for tiers of the space of networks that allow improper networks.
Corollary 5.4**.**
The spaces and are connected under the PR and TBR operation.
We now look at the diameters of under PR and TBR.
Theorem 5.5**.**
*Let , .
Then the diameter of under PR0 is in with the upper bound .*
Proof.
The asymptotic lower bound was proven by Francis et al. [FHMW18, Proposition 4]. Concerning an upper bound, Janssen et al. [JJE*+*18, Theorem 4.22] showed that the distance of two improper networks and under PR is at most , of which PR0 moves are used to transform and into proper networks and . Hence, the PR-distance of and is at most . ∎
Theorem 5.6**.**
*Let , .
Then the diameter of under TBR is in with the upper bound*
[TABLE]
Proof.
Like for PR, the lower bound was proven by Francis et al. [FHMW18, Proposition 4]. In Corollary 4.8 we show that the TBR-distance of two networks and that display a tree and , respectively, is at most . Since by Theorem 1.1 of Ding et al. [DGH11] it follows that . ∎
5.2 Networks displaying networks
Bordewich [Bor03, Proposition 2.9] and Mark et al. [MMS16] showed that the space of rooted phylogenetic trees that display a set of triplets (trees on three leaves) is connected under NNI. Furthermore, Bordewich et al. [BLS17] showed that the space of rooted phylogenetic networks that display a set of rooted phylogenetic trees is connected. We give a general result for unrooted phylogenetic networks that display a set of networks. For this, we will use Lemma 4.5, which, as we recall, guarantees that if a network displays a tree , then there is a sequence of TBR- from to .
Proposition 5.7**.**
*Let be a set of phylogenetic networks on .
Then is connected under NNI, PR, and TBR.*
Proof.
Define the network as follows. Let be the caterpillar where the leaves are ordered from to ; that is, contains a path such that leaf is incident to , leaf is incident to , and leaf is incident to . Let be the edge incident to leaf in . Subdivide with vertices . Now, for , , identify leaf of with of and remove its label . Finally, in the resulting network suppress any degree two vertex. This is necessary if one or more of the have fewer than leaves. The resulting network now displays all networks in . An example is given in Figure 13.
Let . Construct a TBR-sequence from to by, roughly speaking, building a copy of attached to , and then removing the original parts of . First, add to by adding an edge from the edge incident to leaf 1 to the edge incident to leaf 2 with a TBR+. Then add another edge from to the edge incident to leaf 3, and so on up to leaf . Colour all newly added edges and the edges incident to the leaves blue, and all other edges red. Note that the blue edges now give an embedding of into the current network. Now, ignoring all red edges, it is straight forward to add the , one after the other with TBR+ such that the resulting network displays . For example, one could start by adding a tree displayed by and then adding any other edges. The first part works similar to the construction of and the second part is possible by Lemma 4.5. Lastly, remove all red edges with TBR- such that every intermediate network is proper. This is again possible by Lemma 4.5 and yields the network . Note that in the first two stages the red edges (plus external edges) display and in the last phase the non-red edges display .
Since we only used TBR+ and TBR- operations, the statement also holds for PR. For NNI, by Lemma 3.5 we can replace each of these operations that add or remove an edge by NNI-sequences that only move and remove or add the edge . Hence, the statement also holds for NNI. ∎
For the following corollary, note that a quartet is an unrooted binary tree on four leaves and a quarnet is an unrooted binary, level-1 network on four leaves [HMSW18].
Corollary 5.8**.**
Let . Let be a set of phylogenetic trees on , a set of quartets on , or a set of quarnets on . Then is connected under NNI, PR, and TBR.
5.3 Tree-based networks
A related but more restrictive concept to displaying a tree is being tree-based. So, next, we consider the class of tree-based networks. We start with the tiers of , which is the set of tree-based networks that have the tree as base tree.
Theorem 5.9**.**
Let . Then the space is connected under
TBR* with the diameter being between and ,*
PR* with the diameter being between and , and*
NNI* with the diameter being in .*
Proof.
We start with the proof for TBR. Let . Consider embeddings of into and . Let and be the set of all edges not covered by this embedding of in and in . Since is tree-based, and consist of vertex-disjoint edges. Following the embeddings of into and , it is straightforward to move each edge with a TBR0 from to where is in . In total, this requires at most TBR0. Since every intermediate network is clearly in , this gives connectedness of and an upper bound of on the diameter. For the lower bound, consider a network with pairs of parallel edges and without any. Observe that a TBR0 can break at most three pairs of parallel edges and that only if a pair of parallel edges is removed and attached to two other pairs of parallel edge. Hence, for these particular and we have that .
The constructed TBR0-sequence for to above can be converted straightforwardly into a PR0-sequence from to of length at most . For the lower bound, let and be as above and note that a PR can break at most two pairs of parallel edges. Hence, .
By Lemma 3.4, the PR-sequence can be used to construct an NNI-sequence from to that only moves the edges along paths of the embedding of . Since the PR-sequence has length at most and each PR can be replaced by an NNI sequence of length at most , this gives the upper bound of on the diameter of under NNI. ∎
We use Theorem 5.9 to prove connectedness of other spaces of tree-based networks.
Theorem 5.10**.**
*Let .
Then the spaces , , and are each connected under TBR, PR, and NNI. Moreover, the diameter of is in under TBR and PR and in under NNI.*
Proof.
Assume without loss of generality that has the cherry . First, let and be in tiers and of , respectively, such that they are - and -handcuffed on the cherry . Then , as we can decrease the number of handcuffs with NNI-. Since, by Theorem 5.9, the tiers of are connected, the connectedness of follows.
Second, let be tree-based networks on and respectively, and with an -burl on the edge incident to leaf . Ignoring the burls, by Theorem 5.1, can be transformed into by transforming into with NNI0 or with PR0 or TBR0. With Theorem 5.9, the connectedness of and the upper bounds on the diameter follow. The lower bound on the diameter under PR and TBR also follows from Theorem 5.1 and Theorem 5.9,
Lastly, the connectedness of follows similarly from the connectedness of and . ∎
5.4 Level- networks
To conclude this section, we prove the connectedness of the space of level- networks.
Theorem 5.11**.**
*Let and .
Then, the space is connected under TBR and PR with unbounded diameter.*
Proof.
Let and . We show that can be transformed into the network that can be obtained from by adding a -burl to the edge incident to leaf . First, create a -burl in on the edge incident to leaf . This can be done using PR+. Next, using Lemma 4.5 remove all other blobs. This gives a network which consists of a tree with a -burl at leaf . There is a PR0-sequence from to , which is easily converted into a sequence from to . This proves the connectedness of under PR and also TBR. Lastly, note that the diameter is unbounded because the number of possible reticulations in a level- network is unbounded. ∎
Note that an NNI+ cannot directly create a pair of parallel edges. We may instead add a triangle with an NNI+ and then use an NNI0 to transform it into a pair of parallel edges. However, adding the triangle within a level- blob of a level- network, then adding the triangle would increase the level. Therefore, to prove connectedness of level- networks under NNI we use the same idea as for PR but are more careful to not increase the level.
Theorem 5.12**.**
*Let and .
Then, the space is connected under NNI with unbounded diameter.*
Proof.
Let and let . Like in the proof of Theorem 5.11, we want to transform into a network obtained from by adding a -burl to the edge incident to leaf .
Let be a level- blob of . Assume that contains another blob . By Lemma 4.5 there is a PR+-sequence that can remove . Use Lemma 3.5 to substitute this sequence with an NNI-sequence that reduces to a level-1 blob. Note that this can be done locally within blob and its incident edges. Therefore, this process does not increase the level of a network along this sequence. If is now a cycle of size at least three, then we can shrink it to a triangle, if necessary, and remove it with an NNI-. If is a pair of parallel edges and one of its vertices is incident to a degree three vertex that is not part of a level- blob, then use an NNI0 to increase the size of into a triangle by including or merge it with the blob containing . Next, either remove the resulting triangle, or repeat the process above to remove the new blob. Otherwise, ignore for now and continue with another blob of the current network that is neither nor . When this process terminates, we arrive at a network that has only blob , and, potentially, pairs of parallel edges that are incident to both and a leaf. That is the case since a pair of parallel edges incident to a degree three vertex not in could be removed with an NNI0 and an NNI-.
If the edge incident to leaf contains a pair of parallel edges or is incident to a degree three vertex not in , then use NNI+ and NNI0 (or in the latter case) to create a -burl next to leaf . Otherwise, if is incident to three or more cut-edges, then one of them is not incident to leaf and can be moved to the edge incident to leaf with an NNI0-sequence. If is incident to two or fewer cut-edges, there is a vertex incident to three cut edges (since ) and one of them can be moved to the edge incident to leaf with an NNI0-sequence. Then apply the first case again to create a -burl. Finally, remove and any remaining pair of parallel edges. This gives a network which consists of a tree with a -burl at leaf . There is an NNI0-sequence from to , which is easily converted into a sequence from to . Lastly, note that the diameter is unbounded because for each , there is a level- network with reticulations. ∎
6 Isometric relations between spaces
Recall that a space is an isometric subgraph of under a rearrangement operation, say TBR, if the TBR-distance of two networks in is the same as their TBR-distance in . In this section, we investigate this question for under TBR, and for tree-based networks and level-k networks under TBR and PR.
We start with . The proof of the following theorem follows the proof by Bordewich et al. [BLS17, Proposition 7.1] for their equivalent statement for SNPR on rooted phylogenetic trees and networks closely.
Theorem 6.1**.**
The space is an isometric subgraph of under TBR. Moreover, every shortest TBR-sequence from to only uses TBR0.
Proof.
Let and be the TBR-distance in and respectively. To prove the statement, it suffices to show that for every pair . Note that holds by definition. To prove the converse, let be a shortest TBR-sequence from to . Consider the following colouring of the edges of each , for . Colour all edges of blue. For preserve the colouring of to a colouring of for all edges except those affected by the TBR. In particular, an edge that gets added or moved is coloured red, an edge resulting from a vertex suppression is coloured blue if the two merged edges were blue and red otherwise, and the edges resulting from an edge subdivision are coloured like the subdivided edge.
Let be the graph obtained from by removing all red edges. We claim that is a forest with at most components. Since , the statement holds for . If is obtained from by a TBR+, then . If is obtained from by a TBR0 or TBR-, then at most one component gets split. Note that is a so-called agreement forest for and and thus by Theorem 2.13 by Allen and Steel [AS01]. Furthermore, if would use a TBR+, then the forest would contain at most components. However, then ; a contradiction. ∎
Francis et al. [FHMW18] gave the example in Figure 14 to show that the tiers for and are not isometric subgraphs of under NNI. Their question of whether tier zero, , is an isometric subgraph of under NNI remains open.
Lemma 6.2**.**
Let and . Then the space is not an isometric subgraph of under NNI.
Lemma 6.3**.**
For and the space is not an isometric subgraph of under PR.
Proof.
For the networks and in shown in Figure 15 there is a length three PR-sequence that traverses tier , for example, like the depicted sequence . To prove the statement we show that every PR0-sequence from to has length at least four.
The networks and contain the highlighted (sub)blobs , , (resp. and ), , and . Observe that the edges between and and between and may only be pruned from a blob by a PR0 if they get regrafted to the same blob again. Otherwise the resulting network is improper. Note that to derive from an edge has to be regrafted to the “top” of and the edge to has to be pruned. By the first observation, combining these into one PR0 cannot build the connection to . The same applies for the transformation of into and its connection to . Therefore, we either need four PR0 to derive and or two PR0 plus two PR0 to build the connections to and . In conclusion, at least four PR0 are required to transform into , which concludes this proof. ∎
By replacing a leaf with a tree, and adding more pairs of parallel edges to edge leading to , this example can be made to work for and .
Theorem 6.4**.**
For the space is not an isometric subgraph of under TBR and PR.
Proof.
Let be the network in Figure 16. Let be the network derived from by swapping the labels and . Note that , since, from to , we can move leaf 2 next to leaf 1 and then move leaf 1 to where leaf 2 was. However, then the network in the middle is not tree-based, since the blob derived from the Petersen graph has no Hamiltonian path if the two pendent edges of the blob are next to each other [FHM18]. We claim that there is no other length two TBR-sequence from to . For this proof we call a blob derived from the Petersen graph a Petersen blob.
First, note that the TBR0-sequence of and is at least two and there is thus no TBR-sequence that consists of a TBR- and a TBR+. Otherwise, these two operations could be merged into a single TBR0 by 4.1. Note that we can only move leaf 1 or 2 by pruning an incident edge if we do not affect the split 1 versus 2, 3 or break the tree-based property. Therefore, they either have to be swapped using edges of the Petersen blobs or the -chain has to be reversed and leaf 3 moved to the other Petersen blob. However, it is straightforward to check that neither can be done with two TBR0. In particular, we can look at what edge the first TBR0 might move and then check whether a second TBR0 can arrive at . If the first TBR0 breaks a Petersen blob, the problem is that the second TBR0 has to restore it. We then find that this does not allows us to make the initially planned changes to arrive at . On the other hand, if we avoid breaking the Petersen blob and reverse the -chain, then leaf 3 is still on the wrong side; and if we move leaf 3 to the other Petersen blob, then not enough TBR0 moves remain to reverse the chain.
Since there is no other length two TBR0-sequence there is also no other length two PR-sequence. ∎
Theorem 6.5**.**
For and large enough , the space is not an isometric subgraph of under TBR and PR.
Proof.
For even , the networks and in Figure 17 have TBR- and PR-distance two via the network . However, note that in the blobs of size a are merged into a blob of size . Therefore, is not a level- network. We claim that there is no TBR- or PR-sequence of length two that does not go through a level- network like . An example for odd can be derived from this.
It is easy to see that the TBR-distance of and is at least two and there is thus no TBR-sequence that consists of a TBR- and a TBR+. Otherwise, these two operations could be merged into a single TBR0 by 4.1. We thus have to prove that there is no length two TBR0-sequence from to that avoids a level- network. Note that it requires two TBR0 (or PR0) to connect and into . Similarly, it requires either two prunings from the upper five-cycle of to obtain the triangle or one pruning within that cycle. However, in the latter option this would not contribute to connecting and and hence overall at least three operations would be needed. Therefore we have to combine the two operations necessary to create and to create , which however gives us a sequence like the one shown in Figure 17. ∎
Note that the results of this section that show that the spaces of tree-based networks and level- networks are not isometric subgraphs of the space of all networks also hold if we restrict these spaces to a particular tier (for large enough ).
7 Computational complexity
In this section, we consider the computational complexity of computing the TBR-distance and the PR-distance. First, we recall the known results on phylogenetic trees.
Theorem 7.1** ([DHJ*+*97, HDRCB08, AS01]).**
Computing the distance of two trees in is NP-hard for the NNI-distance, the SPR-distance, and the TBR-distance.
In Theorem 6.1, we have shown that is an isometric subgraph of under TBR. Hence, with Theorem 7.1, we get the following corollary.
Corollary 7.2**.**
Computing the TBR-distance of two arbitrary networks in is NP-hard.
We can use the same two theorems to prove that computing the TBR-distance in tiers is also hard.
Theorem 7.3**.**
Computing the TBR-distance of two arbitrary networks in is NP-hard.
Proof.
We (linear-time) reduce the NP-hard problem of computing the TBR-distance of two trees in to computing the TBR-distance of two networks in . For this, let . Let be the edge incident to leaf of . Obtain from by subdividing with a new vertex and adding the edge where is a new vertex labelled . Next, add handcuffs to the cherry to obtain the network . Analogously obtain from .
The equality follows from Lemma 4.10, and the fact that networks handcuffed at a cherry display exactly one tree. More precisely, a TBR-sequence between and induces a TBR-sequence of the same length between and , hence . Conversely, by Lemma 4.10 and the fact that and , it follows that . Since computing the TBR-distance in is NP-hard, the statement follows. ∎
To prove that computing the PR-distance is hard, we use a different reduction. Van Iersel et al. prove that deciding whether a tree is displayed by a (not necessarily proper) phylogenetic network (Unrooted Tree Containment; UTC) is NP-hard [VIKS*+*18]. Combining this with Lemma 4.5, we arrive at our result.
Theorem 7.4**.**
Computing the PR-distance of two arbitrary networks in is NP-hard.
Proof.
We reduce from UTC to the problem of computing the PR-distance of two networks in . Let with a (not necessarily proper) network and be an arbitrary instance of UTC. We obtain an instance of the PR-distance decision problem as follows: remove all cut-edges of that do not separate two labelled leaves, and let be the connected component containing all the leaves; now, let be the proper network obtained from by suppressing all degree two nodes. The instance of the PR-distance decision problem consists of , , and the reticulation number of . As we can compute in polynomial time whether a cut edge separates two labelled leaves, the reduction is polynomial time. Because a displayed tree uses only cut-edges that separate two labelled leaves, is displayed by if and only if it is displayed by . By Lemma 4.5, is a displayed tree of , if and only if , which concludes the proof. ∎
Unlike for the hardness proof of TBR-distance, we cannot readily adapt this proof to the PR-distance in . For this purpose, we need to learn more about the structure of PR-space.
8 Concluding remarks
In this paper, we investigated basic properties of spaces of unrooted phylogenetic networks and their metrics under the rearrangement operations NNI, PR, and TBR. We have proven connectedness and bounds on diameters for different classes of phylogenetic networks, including networks that display a particular set of trees, tree-based networks, and level- networks. Although these parameters have been studied before for classes of rooted phylogenetic network [BLS17], this is the first paper that studies these properties for classes of unrooted phylogenetic networks besides the space of all networks. A summary of our results is shown in Table 1.
To see the improvements in diameter bounds, we compare our results to previously found bounds: For the space of phylogenetic trees it was known that the diameter is asymptotically linearithmic and linear in the size of the trees under NNI and SPR/TBR [LTZ96, DGH11], respectively. Here, we have shown that the diameter under NNI is also asymptotically linearithmic for higher tiers of phylogenetic networks. Whether this also holds in the rooted case is still open. We have further (re)proven the asymptotic linear diameter for PR and TBR of these tiers and, in particular, improved the upper bound on the diameter under TBR to from the previously best bound [JJE*+*18].
To uncover local structures of network spaces, we looked at properties of shortest sequences of moves between two networks. Here we found that shortest TBR-sequences between networks in the same tier never traverse lower tiers, and shortest TBR-sequences between trees also never traverse higher tiers. This implies that is an isometric subgraph of , and that computing the TBR-distance between two networks in is NP-hard. This answers a question by Francis et al. [FHMW18]. We have attempted to prove similar results for other subspaces and rearrangement moves. However, for higher tiers, we have not been able to prove that shortest TBR-sequences never traverse higher tiers. To answer this question we may need to utilise agreement graphs such as frequently used for phylogenetic trees [AS01, BS05] and, more recently, also for rooted phylogenetic networks [KL19, Kla19]. Concerning NNI and PR we gave counterexamples to prove that higher tiers are not isometric subgraphs of . The questions whether is isometrically embedded in under PR and NNI remains open. Answering these questions positively would also provide an answer to the question whether computing the shortest NNI-distance between two networks is NP-hard, and clues toward proving whether the PR-distance between two networks in the same tier is NP-hard. Further negative results that we have shown are that the spaces of tree-based networks and level- are not isometric subgraphs of the space of all phylogenetic networks.
Throughout this paper, we have restricted our attention to proper networks. We could also have chosen to use unrooted networks without the properness condition. This definition, which is mathematically more elegant, is used in most other papers, so it seems to be the obvious choice. However, it is not natural to have cut-edges that do not separate leaves: such networks carry no biological meaning. It is desirable that networks are rootable and thus have an evolutionary interpretation. Unrooted phylogenetic networks are rootable if they have at most one blob with one cut-edge. While using this in the definition of an unrooted phylogenetic network could therefore be sufficient, we go one step further, and ask that there is no such blob. This makes a network rootable at any leaf (i.e., with any taxon as out-group), which gives a stronger biological interpretation and usability.
The fact that our definition of unrooted phylogenetic networks is mathematically more restrictive, means that any positive result we have proven is likely also true when using a less restrictive definition. That is, connectedness for those definitions follows easily by finding sequences to proper networks, like done by Jansen et al. [JJE*+*18]. As we may be able to find short sequences for this purpose, the diameter results will likely also still hold. This means that whatever definitions may be used in practice, with minor additional arguments, our results provide the theoretical background necessary to justify local search operations.
\EdefEscapeHex
Acknowledgments.1Acknowledgments.1\EdefEscapeHexAcknowledgmentsAcknowledgments\[email protected]\hyper@anchorend
Acknowledgements
The first author was supported by the Netherlands Organization for Scientific Research (NWO) Vidi grant 639.072.602. The second author thanks the New Zealand Marsden Fund for their financial support.
\EdefEscapeHex
references.1references.1\EdefEscapeHexReferencesReferences\[email protected]\hyper@anchorend
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AS 01] B. L. Allen and M. Steel, “Subtree transfer operations and their induced metrics on evolutionary trees,” Annals of Combinatorics , vol. 5, no. 1, pp. 1–15, 2001. 10.1007/s 00026-001-8006-8 · doi ↗
- 2[BLS 17] M. Bordewich, S. Linz, and C. Semple, “Lost in space? Generalising subtree prune and regraft to spaces of phylogenetic networks,” Journal of Theoretical Biology , vol. 423, pp. 1–12, 2017. 10.1016/j.jtbi.2017.03.032 · doi ↗
- 3[Bor 03] M. Bordewich, “The complexity of counting and randomised approximation,” Ph.D. dissertation, University of Oxford, 2003. http://community.dur.ac.uk/m.j.r.bordewich/papers/Bordewich 2003-a.pdf
- 4[BS 05] M. Bordewich and C. Semple, “On the computational complexity of the rooted subtree prune and regraft distance,” Annals of Combinatorics , vol. 8, no. 4, pp. 409–423, 2005. 10.1007/s 00026-004-0229-z · doi ↗
- 5[DGH 11] Y. Ding, S. Grünewald, and P. J. Humphries, “On agreement forests,” Journal of Combinatorial Theory, Series A , vol. 118, no. 7, pp. 2059–2065, 2011. 10.1016/j.jcta.2011.04.013 · doi ↗
- 6[DHJ + 97] B. Das Gupta, X. He, T. Jiang, M. Li, J. Tromp, and L. Zhang, “On distances between phylogenetic trees,” in Proceedings of the 8. annual ACM-SIAM Symposium on Discrete Algorithms , 1997, pp. 427–436.
- 7[Die 17] R. Diestel, Graph Theory , 5th ed. Springer Berlin Heidelberg, 2017. 10.1007/978-3-662-53622-3 · doi ↗
- 8[FHM 18] A. Francis, K. T. Huber, and V. Moulton, “Tree-based unrooted phylogenetic networks,” Bulletin of Mathematical Biology , vol. 80, no. 2, pp. 404–416, 2018. 10.1007/s 11538-017-0381-3 · doi ↗
