Binets: fundamental building blocks for phylogenetic networks
Leo van Iersel, Vincent Moulton, Eveline de Swart, Taoyang Wu

TL;DR
This paper investigates the properties of binets, simple building blocks of phylogenetic networks, providing structural insights, complexity results, and algorithms for their compatibility and construction.
Contribution
It offers new structural results on binets, proves complexity bounds for compatibility problems, and develops polynomial-time algorithms for specific cases.
Findings
Compatibility of level-1 binets with binary networks
Binets determine the number of reticulations in a network
Deciding binet compatibility is as hard as Graph Isomorphism
Abstract
Phylogenetic networks are a generalization of evolutionary trees that are used by biologists to represent the evolution of organisms which have undergone reticulate evolution. Essentially, a phylogenetic network is a directed acyclic graph having a unique root in which the leaves are labelled by a given set of species. Recently, some approaches have been developed to construct phylogenetic networks from collections of networks on 2- and 3-leaved networks, which are known as binets and trinets, respectively. Here we study in more depth properties of collections of binets, one of the simplest possible types of networks into which a phylogenetic network can be decomposed. More specifically, we show that if a collection of level-1 binets is compatible with some binary network, then it is also compatible with a binary level-1 network. Our proofs are based on useful structural results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Plant and Fungal Species Descriptions
∎
11institutetext: Leo van Iersel 22institutetext: Delft Institute of Applied Mathematics
Delft University of Technology
The Netherlands
22email: [email protected] 33institutetext: Vincent Moulton and Taoyang Wu44institutetext: School of Computing Sciences
University of East Anglia
Norwich
United Kingdom
44email: [email protected] 55institutetext: Eveline de Swart66institutetext: Delft Institute of Applied Mathematics
Delft University of Technology
The Netherlands
66email: [email protected] 77institutetext: Taoyang Wu88institutetext: School of Computing Sciences
University of East Anglia
Norwich
United Kingdom
88email: [email protected]
Binets: fundamental building blocks for phylogenetic networks††thanks: Part of this work was conducted while Vincent Moulton was visiting Leo van Iersel on a visitors grant funded by the Netherlands Organization for Scientific Research (NWO). Leo van Iersel was partially supported by NWO, including Vidi grant 639.072.602, and partially by the 4TU Applied Mathematics Institute. We thank the editor and the two anonymous referees for their constructive comments.
Leo van Iersel
Vincent Moulton
Eveline de Swart
Taoyang Wu
Abstract
Phylogenetic networks are a generalization of evolutionary trees that are used by biologists to represent the evolution of organisms which have undergone reticulate evolution. Essentially, a phylogenetic network is a directed acyclic graph having a unique root in which the leaves are labelled by a given set of species. Recently, some approaches have been developed to construct phylogenetic networks from collections of networks on 2- and 3-leaved networks, which are known as binets and trinets, respectively. Here we study in more depth properties of collections of binets, one of the simplest possible types of networks into which a phylogenetic network can be decomposed. More specifically, we show that if a collection of level-1 binets is compatible with some binary network, then it is also compatible with a binary level-1 network. Our proofs are based on useful structural results concerning lowest stable ancestors in networks. In addition, we show that, although the binets do not determine the topology of the network, they do determine the number of reticulations in the network, which is one of its most important parameters. We also consider algorithmic questions concerning binets. We show that deciding whether an arbitrary set of binets is compatible with some network is at least as hard as the well-known Graph Isomorphism problem. However, if we restrict to level-1 binets, it is possible to decide in polynomial time whether there exists a binary network that displays all the binets. We also show that to find a network that displays a maximum number of the binets is NP-hard, but that there exists a simple polynomial-time 1/3-approximation algorithm for this problem. It is hoped that these results will eventually assist in the development of new methods for constructing phylogenetic networks from collections of smaller networks.
Keywords:
reticulate evolutionphylogenetic networksubnetworkbinetalgorithm
1 Introduction
Phylogenetic networks are a generalization of evolutionary trees which biologists use to represent the evolution of species that have undergone reticulate evolution. Such networks are essentially directed acyclic graphs having a unique root in which the leaves are labelled by a set of species hrs . In contrast to evolutionary trees, which can only represent speciation events, phylogenetic networks permit the representation of evolutionary events such as gene transfer and hybridization which are known to occur in organisms such as bacteria and plants, respectively. Although theoretical properties of evolutionary trees have been studied since at least the 1970’s, phylogenetic networks have been considered from this perspective only more recently, especially the rooted variants which we will focus on in this paper.
One of the most important open questions concerning phylogenetic networks is how to construct them for biological datasets bapteste2013networks . It is now common practice for biologists to construct evolutionary trees from molecular data, and several computer programs are available for this purpose felsenstein2004inferring . However, the problem of constructing networks from such data is an active area of research, and there are only a limited number of programs available for biologists to perform this task. A survey of some of these methods and the theory underpinning phylogenetic networks may be found in gus14 ; hrs ; M11 .
One approach that has been recently developed for constructing phylogenetic networks involves building them up from smaller networks, using what can be thought of as a divide-and-conquer approach oldman2016trilonet . In particular, for a set of species, a network is constructed for every subset of size 3 (called a trinet), and then the trinets are puzzled together to build a network (see Figure 1 for an example of a trinet). This approach constructs and is based on level-1 networks, networks that are slightly more general than evolutionary trees (see Section 2 for the definition of such networks).
At first sight, it might appear that trinets are the simplest possible networks that could be considered for building up networks from smaller ones. However, trinets contain even simpler networks called binets, networks with 2 leaves (see e.g. Figure 1 for a level-1 trinet and the binets that it displays). Note that whereas binets are the smallest informative building blocks for phylogenetic networks, for rooted phylogenetic trees, these are 3-leaf trees (see e.g. byrka2010new ). Interestingly, even though binets are in themselves very simple, the collection of binets displayed by a network can still contain some useful information concerning the network. Indeed, in the aforementioned approach for building level-1 networks from trinets, binets are used in the process of puzzling together the trinets.
In light of these considerations some obvious questions immediately arise concerning binets. For example, when is a collection of binets displayed by some phylogenetic network (the compatibility problem), and how much information might we expect to extract concerning a phylogenetic network by just looking at the collection of binets that it displays? In this paper, we shall address these and related algorithmic questions concerning binets. It is hoped that these results will be useful in future for developing improved methods for constructing phylogenetic networks from smaller networks.
We now present a summary of the rest of the paper. After introducing some preliminaries concerning phylogenetic networks in the next section, we derive a key structural result for networks (Corollary 1) which is useful in identifying which of the two possible types of binet is displayed on two leaves within a binary phylogenetic network (that is a network in which all internal vertices have degree 3). Using this theorem, in Section 4 we show that the collection of level-1 binets displayed by any binary phylogenetic network can always be displayed by some binary level-1 network (Theorem 4.2). This reduces the problem of understanding binets displayed by arbitrary binary networks to level-1 networks. To prove this result, we develop a framework which also implies that there is a polynomial-time algorithm in for deciding whether or not a collection of level-1 binets with combined leaf-set can be displayed by some network with leaf-set , and, if it is, gives a level-1 network that does this (see Section 6). Note that this is related to an algorithm presented in himsw .
In Section 5, we turn to the question as to what can be deduced about the features of a phylogenetic network just by considering the collection of binets that it displays. Note that, as might be expected, there are networks - even trinets - that display the same set of binets but that are not equivalent. For example, the two trinets in Figure 1 both display the same set of binets, but they are not equivalent. Even so, we will show in Theorem 5.1 that if two level-1 networks both display exactly the same collection of binets, then they must have the same number of reticulation vertices (indegree-2 vertices). Note that the number of such vertices corresponds to the number of reticulate evolutionary events, such as hybridization, that took place in the evolutionary history of the species labelling the leaves of the network. Consequently, the binets displayed by a network can at least capture a useful course-grained feature of the network in question.
In Sections 6 and 7, we consider some algorithmic questions concerning binets. As we have mentioned above, it can be decided in polynomial time in as to when a collection of binets with combined leaf-set is displayed by some level-1 network on . However, we show that if we consider arbitrary binets (i.e. not necessarily binary or level-1) then this decision problem becomes at least as hard as the graph-isomorphism problem (see Theorem 6.1), one of the most famous problems whose complexity is still unknown. In addition, in Section 7 we consider a related problem which, for a given collection of binary level-1 binets, asks for a network which displays the maximum number of binets in this collection. This is closely related to the maximum rooted triplet consistency problem for evolutionary trees byrka2010new . We show that the binet problem is NP-complete (Theorem 7.1), by giving a reduction from the feedback-arc set problem. However, we also show that the problem is 1/3-approximable. In fact, given any collection of binary level-1 binets we can always find some network that displays at least 1/3 of the binets (see Theorem 7.2). We conclude in Section 8 with discussion of some possible future research directions, and a brief discussion of a potential application of our results.
2 Preliminaries
Throughout this paper, is a non-empty finite set (which usually represents a set of species or organisms).
2.1 Digraphs
A directed graph, or digraph for short, consists of a finite set of vertices and a set of arcs, where each arc is an ordered pair of vertices in in which is said to be a parent of , denoted by , and a child of . All digraphs studied here contain no loops, that is, vertices that are children of themselves. The in-degree of vertex is the number of vertices in such that is an arc, and the out-degree of is the number of vertices with being an arc. A root is a vertex with in-degree 0. A leaf is a vertex of out-degree 0 and the set of leaves is denoted by . Any vertex in that is neither a root nor a leaf is referred to as an interior vertex. In addition, an interior vertex is a tree vertex if it has in-degree 1, and a reticulation vertex if it has in-degree greater than 1.
A directed path or dipath in a digraph is a sequence () of vertices such that is an arc for . An acyclic digraph is a digraph that does not contain any directed path starting and ending at the same vertex. If an acyclic digraph contains a unique root, which is usually designated by , then it will be referred to as a rooted acyclic digraph.
An acyclic digraph induces a canonical partial order on its vertex set , that is, if there exists a directed path from to . In this case, we shall say that is below . When the digraph is clear from the context, will be written as . In addition, we write if or . Given a subset of the vertex set of an acyclic digraph, we say that is a lowest vertex in if there is no with .
Let be the undirected graph obtained from digraph by ignoring the direction of the arcs in . Then is connected if is connected, that is, there exists an undirected path between every pair of distinct vertices in . Note that a rooted acyclic digraph is necessarily connected (since each connected component of an acyclic digraph has at least one root). A cut vertex is a vertex of whose removal disconnects . Similarly, a cut arc is an arc of whose removal disconnects . A directed graph is biconnected if it contains no cut vertex, and a biconnected component of is a maximal biconnected subgraph, which is called trivial if it contains precisely one arc (which is necessarily a cut arc), and non-trivial otherwise.
2.2 Phylogenetic networks
A phylogenetic network on is a rooted acyclic digraph whose leaves are bijectively labeled by the elements in and which does not contain any vertex with in-degree one and out-degree one. For simplicity, we will just write in case there is no confusion about the labeling. To simplify the argument, throughout this paper we will also assume that all leaves in a phylogenetic network have in-degree one. In addition, a phylogenetic network is binary if each tree vertex, as well as the root, has out-degree 2, and each reticulation vertex has in-degree 2 and out-degree 1. Finally, we say a binary phylogenetic network is level- () if each of its biconnected components contains at most reticulation vertices. To some extent, the concept of the level of a phylogenetic network can be regarded as a measure of its ‘distance’ to being a phylogenetic tree. In particular, a binary phylogenetic network is a phylogenetic tree if and only if it is level-0. A phylogenetic network is called simple if it contains precisely one non-trivial biconnected component and no cut arcs other than the ones leaving .
Two networks and on are said to be isomorphic if there exists a bijection such that for all , and is an arc in if and only if is an arc in .
Finally, the cluster of a vertex , denoted by , is defined as the subset of consisting of the leaves below . Here we will use the convention that if is a leaf.
2.3 Stable ancestors and binets
Given a phylogenetic network on and a subset , a stable ancestor of in is a vertex in such that every path in from the root to a vertex in contains . Note that for two stable ancestors and of , we have either or . Therefore, there exists a unique lowest vertex in the set of stable ancestors of , which will be referred to as the lowest stable ancestor of in and denoted by . Note that for a subset of with , there exist two elements and in such that . For simplicity, we also write as .
The following property of lowest stable ancestors will be useful.
Lemma 1
Suppose that and are two vertices in a phylogenetic network such that , then we have .
Proof
Since , we know that there exists a dipath from to that contains . By the definition of lowest stable ancestor, we know that and are contained in . Hence, either or . If , then we have . Then there exists a dipath from to that does not contain (otherwise would be a stable ancestor of that is below ). Using that , it follows that there exists a dipath from to that does not contain , a contradiction. Therefore, . ∎
For , the subnet of on , denoted by , is defined as the subgraph obtained from by deleting all vertices that are not on any path from to elements in and subsequently suppressing all in-degree 1 and out-degree 1 vertices and parallel arcs until no such vertices or arcs exist. A network is said to be displayed by network if for some .
Note that, by definition, if and only if . In this case, is referred to as a recoverable network. Note that every subnet of is necessarily recoverable. Moreover, a collection of subnets is displayed by some network if and only if it is displayed by some recoverable network. Therefore, we assume all networks in this paper to be recoverable.
A binet is a phylogenetic network with precisely two leaves, while a trinet is a phylogenetic network with precisely three leaves. Let
[TABLE]
be the collection of binets displayed by . Note that there are precisely three binary level-1 binets on a set , and they can be grouped into two types: the “tree type”, , and the “reticulate type” and (see Figure 2). A collection of binets on is a collection of binets such that the union of the leaf-sets of the binets is equal to .
3 A structure theorem
In this section we present a key result (Corollary 1) concerning the structure of the non-trivial biconnected component of a simple network. Note that a similar result has been obtained for a special collection of (non-binary) phylogenetic networks in hmw16 .
Let be a directed acyclic graph and let be an undirected path in the underlying undirected graph , then a vertex (with ) is called alternating (with respect to ) if we have either or . The number of alternating vertices contained in is denoted by . Using this concept, we now prove the following theorem. See Figure 3 for an example.
Theorem 3.1
Let be a binary phylogenetic network on whose root is in some non-trivial biconnected component . Then there exists a lowest vertex in with .
Proof
Let be the set of reticulation vertices in for which the distance (length of a shortest directed path) between and is minimum over all reticulation vertices in . Note that .
We first show that for all . Suppose this were not the case. Then there exists a vertex such that . Note that necessarily has outdegree 2 and therefore has indegree 1 since is binary and . Denote the parent of by . Since is biconnected, there exists some undirected path from to that does not contain the edge . Let , where and , be such an undirected path for which is minimum.
We claim that . To see this, note first that since , and , we know that and are arcs of . Hence, is odd and strictly positive. Assume for the sake of contradiction that , then we have . Let () be the second alternating vertex contained in (when travelling from to ).
Now fix a directed path in from to .
If the arc is not contained in , then, we can find an undirected path from to that does not contain and has fewer alternating vertices than by following until we reach a vertex in and then following to . This gives a contradiction.
Now assume that the arc is contained in . Then we can find an undirected path from to that does not contain and has only one alternating vertex as follows. Follow up to and then follow backward from to . Since this path has fewer alternating vertices than , we again obtain a contradiction.
We have thus shown that . Denoting this alternating vertex in by , then is necessarily a reticulation by the choice of . Hence, consists of two directed paths: a directed path from to that does not contain and a directed path from to . However, this means that , a contradiction to the assumption that .
Hence, we know that is the set of reticulation vertices of such that and that is not empty.
Now fix a vertex in that is lowest over all vertices of , that is, there does not exist a vertex in such that . It remains to show that is lowest over all vertices of . Assume that this is not the case. Then the child of is also in . If were a reticulation then, by Lemma 1, . However, this would imply that , contradicting the choice of . Hence, is a tree vertex.
Since is biconnected, there exists some undirected path from to that does not contain . Let be such a path such that is minimum. Note that we have and .
Since is a tree vertex and does not contain its parent , is an arc of . Together with being an arc in , we know that is odd and strictly positive. We now show, using a similar proof as above, that . If this were not the case, then we would have . Let () be the second alternating vertex contained in . We know that and are two arcs contained in . Now fix a directed path in from to .
If the vertex is not contained in , then we can find an undirected path from to that does not contain and has fewer alternating vertices than by following from it reaches a vertex from and then following up to . If is contained in , then we follow from to and then follow from to and obtain an undirected path from to that does not contain and has one alternating vertices, which is less than the number of alternating vertices in . In either case, we obtain a contradiction.
We have thus shown that . Denoting this alternating vertex in by , then is necessarily a reticulation by the choice of . Hence, consists of two directed paths: a directed path from to that does not contain and a directed path from to . However, this means that , and hence in view of Lemma 1. This implies that , a contradiction to the assumption that is lowest among . ∎
The following is a direct consequence of the above theorem.
Corollary 1
Suppose that is a simple binary phylogenetic network. Let be the unique non-trivial biconnected component of . Then there exists a lowest vertex of such that there exist two arc-disjoint directed paths from the root of to .
4 Displaying binets by binary networks
A collection of binary level-1 binets is compatible if there exists some binary network that displays all binets from the collection. In this section, we study the compatibility of binets. Our main result in this section (Theorem 4.2) shows that when studying the compatibility of binets, we can restrict to binary level-1 networks.
We will restrict ourselves throughout this section to thin collections of binets, i.e. collections containing at most one binet on and for all distinct . Clearly, any collection of binets that is not thin is not compatible.
First, we need some new definitions. Given a digraph , a sink set of is a proper subset such that there is no arc leaving , that is, there exists no arc with and . A bipartition (or split) of into nonempty sets and , denoted , is called
- •
Type I if both and are sink sets (i.e. there is no arc from any element in to any element in or vice versa);
- •
Type II if either or (but not both) is a sink set; and
- •
Type III if for all is an arc in if and only if is an arc in .
We say that is a typed split of if it is a split of Type I, II or III.
For a collection of binary level-1 binets on , we introduce the digraph with vertex set and being an arc in if or . See Figure 4 for an example.
The following two lemmas show important properties of typed splits that will be used to establish Theorems 4.1 and 4.2.
Lemma 2
Suppose that and are two thin collections of binary level-1 binets on with . Then each typed split of is a typed split of .
Proof
Suppose that is a typed split of . If is of Type I in , then it is of Type I in since is a subgraph of . Similarly, if is of Type II in , then it is of Type I or II in . If is of Type III in then (since is thin) any binet on and with and is . Therefore, is of Type I or III in . ∎
Lemma 3
Suppose that is a thin collection of binary level-1 binets on . If is displayed by a binary network, then has a typed split.
Proof
Suppose that is displayed by a binary network. Then is displayed by a binary recoverable network . Let be the set of binary level-1 binets contained in . Then we have . By Lemma 2, it suffices to show that has a typed split.
Consider the root of , which is equal to since is recoverable. Denote the two children of by and . We consider two cases.
The first case is that at least one arc incident with is a cut arc. Then the other arc incident with is also a cut arc. Then let and . Note that is a split because neither nor is empty. In addition, for all we have and hence is a Type III split with respect to .
In the second case, both arcs incident with are not cut arcs. Hence, the root is contained in a non-trivial biconnected component containing and . By Corollary 1, there exists a lowest vertex in with two arc-disjoint paths from to . Since is a lowest vertex in , we know that is a reticulation vertex and the arc leaving is a cut arc. Let and . Then is clearly nonempty. In addition, is nonempty, as otherwise , a contradiction to the fact that (as is recoverable). Therefore, is a split.
Consider and and the subnetwork . There is at least one directed path from to , and each such path contains at least one arc of or . Hence, in the process of obtaining from , the paths do not become parallel arcs. Therefore, contains two arc-disjoint paths from to and we can conclude that . Therefore, if , that is, is level-1, then . This implies that there is no arc . Therefore, is a Type I or Type II split of . ∎
Note that the condition that is displayed by a binary network in the above lemma can not be weakened to that is displayed by a network. For example, consider the binet collection and network in Figure 4. Although network displays , digraph has no typed split (as can be easily checked).
We now introduce two operations, which can be used to combine two phylogenetic networks into a new one. Suppose that and are two phylogenetic networks with disjoint leaf sets. Let be the phylogenetic network obtained from and by adding a new vertex and two arcs from to the roots of and . In addition, the network is obtained by taking a binet , with , and replacing by the root of , for . See Figure 5 for examples.
For a binet set on and a subset , we define
[TABLE]
The next theorem can be used to determine in polynomial time whether a collection of binary level-1 binets is displayed by some binary level-1 network. See Section 6 for more details.
Theorem 4.1
Suppose that is a thin collection of binary level-1 binets on . If there exists a typed split of such that and are both displayed by some binary level-1 network, then is displayed by a binary level-1 network. Moreover, if is displayed by a binary level-1 network, then there exists at least one typed split of and, for each typed split of , and are both displayed by some binary level-1 network.
Proof
First suppose that there exists a typed split of such that and are displayed by binary level-1 networks and , respectively.
If is a Type I or Type III split of , then consider the network . Then is a binary level-1 phylogenetic network on and
[TABLE]
and so is displayed by .
If is a Type II split of , then without loss of generality we may assume that is a sink set in . Now consider the network . Then is a binary level-1 phylogenetic network on and
[TABLE]
and so is displayed by .
Now suppose that is displayed by a binary level-1 network . By Lemma 3, there exists a typed split of . Then and . ∎
We now prove the main result of this section.
Theorem 4.2
Suppose that is a thin collection of binary level-1 binets on . Then is displayed by a binary level-1 network if and only if it is displayed by a binary network.
Proof
Suppose that is displayed by a binary network. We claim that is also displayed by a binary level-1 network. We shall establish this claim by induction on .
If , then contains at most one binet, which has leaf set . Therefore we know that is displayed by a binary level-1 network.
Now assume that , and the claim holds for all sets with . Let be a binary network on with . By Lemma 3, there exists a typed split of . Note that and . Therefore, by induction, each of and is displayed by a binary level-1 network. By Theorem 4.1, it follows that is displayed by a binary level-1 network. ∎
5 Binets determine the number of reticulations of a binary level-1 network
In this section we show that, that although the collection of binets displayed by a level-1 network does not necessarily determine the network (see Figure 1), it does in fact determine the number of reticulations in the network. We begin by showing that it suffices to consider level-1 networks in which all cycles (in the underlying undirected graph) have length 3.
First, we introduce some further notation. A semi-cycle of an acyclic directed graph is the union of two non-identical, internally-vertex-disjoint, directed paths from to , with and two distinct vertices that are referred to as the source and terminal of , respectively. The length of a semi-cycle is the number of distinct vertices that it contains.
We now show that we may restrict to networks in which all semi-cycles have length 3.
Lemma 4
If is a binary level-1 network, then there exists a binary level-1 network in which every semi-cycle has length 3, such that and and have the same number of reticulation vertices.
Proof
Consider a semi-cycle of with source and terminal and length at least 4. Let , ,, be the arcs leaving the semi-cycle. Then . Let be a network obtained from a binary tree on by replacing by the subgraph of rooted at , for . Let be the subgraph of rooted at . Then we construct from by replacing the subgraph of rooted at by the network . It is straightforward to see that is a binary level-1 network with the required properties.∎
We now establish the main result of this section.
Theorem 5.1
If and are rooted binary level-1 phylogenetic networks on with then and have the same number of reticulation vertices.
Proof
The proof is by induction on the number of leaves . The induction basis for is clear. Now suppose that and are two non-isomorphic rooted binary level-1 phylogenetic networks on with but with different numbers of reticulation vertices. We add an outdegree-1 root to each of and with an arc to the original root. By Lemma 4, we may assume that all semi-cycles in and have length 3.
Choose an arbitrary leaf and let . Let and be the networks obtained from and , respectively, by adding an outdegree-1 root with an arc to the original root. Then and have the same number of reticulation vertices by induction.
Since all semi-cycles in and are assumed to have length 3, there are three cases for the location of in each of the networks , illustrated in Figure 6.
If the parent of is in a semi-cycle in , let be the source of this semi-cycle, and let be the parent of otherwise. Let and (recall that denotes the cluster of ).
We now consider the different ways in which we could add to both networks. Since and have different numbers of reticulation vertices, there are two cases to consider (after eliminating symmetric cases), as illustrated in Figure 7.
The first case is that the parent of is not in a semi-cycle in but is the terminal of a semi-cycle in . First suppose that . Then choose an arbitrary vertex . Then while , a contradiction. Hence, we may assume that . Then and . Clearly, . Take and . Then and hence , from which we can deduce that . In addition, and hence , from which we can deduce that . This leads to a contradiction since .
The second case is that the parent of is not in a semi-cycle in but is the non-terminal non-source vertex of a semi-cycle in . First suppose that . Then choose an arbitrary vertex . Then while , a contradiction. Hence, we may assume that . Then, as in the previous case, and . Take and . Then, similar to the previous case, and hence , from which we can deduce that . In addition, and hence , from which we can deduce that . This again leads to a contradiction since . ∎
6 Complexity of Binet Compatibility
A direct consequence of Theorem 4.1 is that there exists a simple polynomial-time algorithm to decide whether there exists a binary level-1 network displaying a given collection of binary level-1 binets (see himsw for a related algorithm). In particular, a sink set of can be found in polynomial-time by computing the strongly connected components of tarjan1972depth and checking for each of them whether it is a sink set. This can be used to find a typed split, if it exists. If such a split does not exist, then is not compatible. Otherwise, we can try to construct networks for and recursively, and combine them as described in the proof of Theorem 4.1. This algorithm is similar to the Aho algorithm for deciding whether a set of rooted trees can be displayed by some rooted tree aho1981inferring .
From Theorem 4.2, it now follows that the following problem can also be solved in polynomial time.
Binet Compatibility (BC)
Input: a set of binary level-1 binets.
Question: is compatible, i.e., does there exist a binary network with ?
We show now that the assumption that all binets in are binary and level-1 is essential. Indeed, for general binets, the compatibility problem is at least as hard as the well-known graph isomorphism problem (GI) GI1 ; GI2 , which is not known to be solvable in polynomial time. This is even true when the given binet set is thin (contains at most one binet for each pair of leaves).
Theorem 6.1
Deciding whether there exists a phylogenetic network displaying a given thin set of binets is GI-hard.
Proof
We reduce from DAG-isomorphism, which is known to be GI-complete GI2 . Let be two directed acyclic graphs, which form an instance of the DAG-isomorphism problem. For , we add vertices , a new leaf labelled , an arc from to each indegree-0 vertex of and from each outdegree-0 vertex of to and arcs , , , and . In , we add a new leaf labelled and an arc . In , we add a new leaf labelled and an arc . We have thus transformed into a binet and into a binet . The third binet is . See Figure 8 for an illustration.
We claim that and are isomorphic if and only if there exists a network displaying and .
First assume that and are isomorphic. Then we can construct a network displaying and as follows. Take and subdivide the arc by a new vertex and add leaf with an arc . The obtained network clearly displays and and it also displays since and are isomorphic.
Now assume that there exists some network displaying and . Then . Hence, contains a cycle (in the underlying undirected graph) containing a reticulation , such that and the image of are below the arc leaving , while is below some other arc leaving the cycle. Since , leaf is not below in . Therefore, deleting , and the parent of from the subgraph of rooted at gives .
Similarly, contains a cycle containing a reticulation , such that and the image of are below the arc leaving , while is below some other arc leaving the cycle. Since , leaf is not below in . Therefore, deleting , and the parent of from the subgraph of rooted at gives .
Moreover, since . Hence, and are isomorphic. ∎
7 Maximum Binet Compatibility
If a collection of binets is not compatible, the question arises whether it is possible to find a largest compatible subset of the binets, in polynomial time. Here we show that this is unlikely to be the case. The decision version of this problem is defined as follows.
Maximum Binet Compatibility (MBC)
Input: a set of binary level-1 binets and an integer .
Question: does there exist a compatible subset of with ?
We now establish the complexity of this problem (see Theorem 7.1). Recall from Section 5 that and denote the source and terminal of a semi-cycle , respectively.
Lemma 5
If the binet is displayed by a binary level-1 network , then is the source of a semi-cycle in . In addition, is below and is not below .
Proof
Let . Note that is not a reticulation vertex, as otherwise the child of would be a stable ancestor of and that is below . Hence, has two children, denoted by and .
Observe that neither nor is a cut arc, since otherwise we would have , while by the assumption of the lemma . Hence, is the source of a semi-cycle . Let be the terminal of . If neither nor is below , then , a contradiction. If both and are below , then is a stable ancestor of and , a contradiction to . Therefore, precisely one of and is below . If is below and is not, then , a contradiction. Therefore, is below and is not. ∎
In view of the last lemma, for each binet , there exists a unique semi-cycle containing .
Lemma 6
If the two binets and are both displayed by a binary level-1 network , then
[TABLE]
Proof
Let and . By Lemma 5, but is not below , from which we know that . Since and are stable ancestors of in view of Lemma 5, we have either or but not both.
Note that if , then and hence , a contradiction. Thus , from which it follows that . ∎
Given a digraph , let be the collection of binets induced by . Note that is a binet set on , i.e., the leaves of the binets in correspond to the vertices of .
Proposition 1
Let be a digraph. Then is acyclic if and only if is compatible.
Proof
Let , with the vertex set of . Suppose first that is acyclic, then there exists a topological sorting of , that is, the vertices of can be ordered as so that implies . Hence, the network in Figure 9 displays since displays each binet with .
Conversely, suppose that is compatible. By Theorem 4.2, there exists a binary level-1 network with . It remains to show that is acyclic. If not, then there exists a directed cycle for some . Denote . In view of Lemma 5, let be the semi-cycle in containing for . Then Lemma 5 implies and that is not below . On the other hand, by Lemma 6 we have
[TABLE]
Together with , it follows that , a contradiction. ∎
A set of binets on is said to be dense if for each pair of distinct elements and in , there exists precisely one binet on in . Hence, a dense set of binets is always thin.
Theorem 7.1
The problem MBC is NP-complete, even if the given set of binets is dense.
Proof
We reduce from the NP-hard problem Feedback Arc Set in Tournaments (FAST) alon2006ranking ; charbit2007minimum , which is defined as follows. Given a tournament, i.e. a digraph with either or (but not both) for each pair of distinct elements and in , and given a positive integer , does there exist a subset of at most arcs whose removal makes acyclic. If such an arc set exists, then we call it a feedback arc set of .
The reduction is as follows. For each instance of FAST, consider the corresponding instance of MBC with . Since the set of binets induced by can be constructed in polynomial time, it suffices to show that contains a feedback arc set with size at most if and only if there exists a compatible subset of of size at least .
First assume that there exists a feedback arc set of with size at most . That is, , and the digraph obtained from by deleting the arcs in is acyclic. Consider the set of binets . This set contains at least binets. In addition, since , it follows by Proposition 1 that is compatible.
Now assume that there exists a compatible binet set with . Consider the set of arcs of . Then by Proposition 1, it follows that is a feedback arc set. Moreover, , which completes the proof. ∎
We complete the section by showing that there exists a polynomial time -approximation algorithm for the MBC problem, which follows directly from the next theorem and its proof.
Theorem 7.2
Suppose that is a set of binary level-1 binets on . Then there exists a binary level-1 network such that .
Proof
If at least a third of the binets in are tree type, then take to be any binary tree on and we are done. Hence we may assume that at least two thirds of the binets are reticulate type.
Impose an arbitrary ordering on the elements in , that is, write . Let and . Without loss of generality, we may assume that (as the other case can be established in a similar way). Since at least two thirds of the binets are reticulate type, and each of those is contained in either or (but not both), we know that . Now consider the network in Figure 9, then clearly we have . Thus we have , from which the theorem follows. ∎
8 Discussion
In this paper we have developed some combinatorial results concerning collections of level-1 binets. Several interesting questions arise from these results. For example, we have shown that the collection of level-1 binets displayed by a binary phylogenetic network can be displayed by some level-1 network, but is there some canonical level-1 network that could be used to display such a collection? In addition, can we count the number of binary level-1 networks that display a dense compatible collection of binets? We have also seen that the collection of binets displayed by a binary level-1 network determine its reticulation number. Therefore it is natural to ask which properties of a phylogenetic network in general are determined by its binets?
We have also studied some algorithmic questions concerning binets. Concerning the maximum binet compatibilty problem, note that the constant is sharp in Theorem 7.2. For example, consider the binet collection . However, can a better bound be achieved by restricting to thin collections of binets, and can improved approximation algorithms also be found?
In another direction, it would be interesting to know whether similar results to those proven in this paper might hold for higher level networks. For example, what can be said about properties of collections of level-2 binets, and does Theorem 5.1 hold also for higher level networks? Also, we could try to generalize some of our results to -nets, i.e. networks on leaves, . For example, does Theorem 4.2 hold for trinets? In general, it would be interesting to know what additional information the collection of -nets displayed by a network might contain for . Note that it has been shown that trinets do not completely determine rooted networks in general huber2015much . However, do they determine properties of networks such as the number of reticulations?
Similarly, it would be interesting to extend some of our algorithmic results to higher-level networks and -nets. For example, it is known that the compatibility problem is NP-complete for collections of level-1 trinets himsw . However, to date the maximum trinet compatibility problem has not been studied.
Eventually, it is hoped that new results in these directions could be useful for developing novel methods to construct phylogenetic networks from higher-level networks and k-nets. For example, using our results it may be possible to develop approaches to build a consensus network for a collection of phylogenetic trees or networks. Note that consensus networks have already proven themselves useful in the unrooted setting, where they are used to summarize key features displayed by a collection of trees or networks (see e.g. holland2004consensus ). A consensus method based on binets could work by breaking each of the given networks down into a collection of binets, and then developing methods to pool together the information contained in the resulting binets so as to construct some consensus network, or at least some constraints that any such network should satisfy. Note that similar approaches have been developed to build consensus trees for a collection of phylogenetic trees by breaking each of the trees down into a collection of triplets (see e.g. (bryant2003classification, , Section 2)). Probably it would be of some interest to first consider how to construct a level-1 consensus network for a collection of level-1 networks by breaking each of them down into level-1 binets. This is already likely to be quite challenging in view of our result concerning NP-completeness of Maximum Binet Compatibility.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) Aho, A.V., Sagiv, Y., Szymanski, T.G., Ullman, J.D.: Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM Journal on Computing 10 (3), 405–421 (1981)
- 2(2) Alon, N.: Ranking tournaments. SIAM Journal on Discrete Mathematics 20 (1), 137–142 (2006)
- 3(3) Bapteste, E., van Iersel, L.J.J., Janke, A., Kelchner, S., Kelk, S., Mc Inerney, J.O., Morrison, D.A., Nakhleh, L., Steel, M., Stougie, L., Whitfield, J.: Networks: expanding evolutionary thinking. Trends in Genetics 29 (8), 439–441 (2013)
- 4(4) Bryant, D.: A classification of consensus methods for phylogenetics. DIMACS series in discrete mathematics and theoretical computer science 61 , 163–184 (2003)
- 5(5) Byrka, J., Guillemot, S., Jansson, J.: New results on optimizing rooted triplets consistency. Discrete Applied Mathematics 158 (11), 1136–1147 (2010)
- 6(6) Charbit, P., Thomassé, S., Yeo, A.: The minimum feedback arc set problem is NP-hard for tournaments. Combinatorics, Probability and Computing 16 (01), 1–4 (2007)
- 7(7) Felsenstein, J.: Inferring phylogenies. Sinauer Associates Sunderland (2004)
- 8(8) Goldberg, M.: The graph isomorphism problem. In: J.L. Gross, J. Yellen (eds.) Handbook of Graph Theory, pp. 68–78. CRC Press (2003)
