A class of phylogenetic networks reconstructable from ancestral profiles
Peter L. Erdos, Charles Semple, Mike Steel

TL;DR
This paper introduces orchard networks, a class of phylogenetic networks, and provides a polynomial-time method to uniquely reconstruct them from ancestral profiles, extending previous results to more complex networks with reticulation.
Contribution
The paper defines orchard networks and proves they can be uniquely reconstructed from ancestral profiles using a polynomial-time algorithm, generalizing prior reconstruction results.
Findings
Reconstruction is possible for orchard networks from ancestral profiles.
A polynomial-time algorithm is provided for reconstructing orchard networks.
The class of orchard networks includes several previously studied network types.
Abstract
Rooted phylogenetic networks provide an explicit representation of the evolutionary history of a set of sampled species. In contrast to phylogenetic trees which show only speciation events, networks can also accommodate reticulate processes (for example, hybrid evolution, endosymbiosis, and lateral gene transfer). A major goal in systematic biology is to infer evolutionary relationships, and while phylogenetic trees can be uniquely determined from various simple combinatorial data on , for networks the reconstruction question is much more subtle. Here we ask when can a network be uniquely reconstructed from its `ancestral profile' (the number of paths from each ancestral vertex to each element in ). We show that reconstruction holds (even within the class of all networks) for a class of networks we call `orchard networks', and we provide a polynomial-time algorithm for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Class of Phylogenetic Networks Reconstructable from Ancestral Profiles
Péter L. Erdős
Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences, Budapest, Hungary
,
Charles Semple
School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
and
Mike Steel
School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
Abstract.
Rooted phylogenetic networks provide an explicit representation of the evolutionary history of a set of sampled species. In contrast to phylogenetic trees which show only speciation events, networks can also accommodate reticulate processes (for example, hybrid evolution, endosymbiosis, and lateral gene transfer). A major goal in systematic biology is to infer evolutionary relationships, and while phylogenetic trees can be uniquely determined from various simple combinatorial data on , for networks the reconstruction question is much more subtle. Here we ask when can a network be uniquely reconstructed from its ‘ancestral profile’ (the number of paths from each ancestral vertex to each element in ). We show that reconstruction holds (even within the class of all networks) for a class of networks we call ‘orchard networks’, and we provide a polynomial-time algorithm for reconstructing any orchard network from its ancestral profile. Our approach relies on establishing a structural theorem for orchard networks, which also provides for a fast (polynomial-time) algorithm to test if any given network is of orchard type. Since the class of orchard networks includes tree-sibling tree-consistent networks and tree-child networks, our result generalise reconstruction results from 2008 and 2009. Orchard networks allow for an unbounded number of reticulation vertices, in contrast to tree-sibling tree-consistent networks and tree-child networks for which is at most and , respectively.
Key words and phrases:
Tree-child networks, orchard networks, accumulation phylogenies, ancestral profiles, path-tuples
1991 Mathematics Subject Classification:
05C85, 92D15
The first author was supported in part by the National Research, Development and Innovation Office (NKFIH grants K 116769 and KH 126853). The second and third authors were supported by the New Zealand Marsden Fund (UOC1709).
1. Introduction
Phylogenetic trees and networks have become a ubiquitous tool for representing evolutionary relationships in systematics biology [7] and other areas of classification (for example, language evolution and epidemiology). From early sketches by Charles Darwin and Ernst Haeckel in the 19th century, more complex and detailed trees are now revealing the finer details of portions of the ‘tree of life’. Today, biologists routinely build phylogenetic trees on hundreds of species, such as the recent tree of (nearly) all 10,000 species of birds [14]. Phylogenetic trees have a leaf set that consists of the sampled organisms (typically, a group of present-day species); the root of the tree represents the most recent common ancestor of the species in . Current methods for inferring phylogenetic trees trees generally use genomic data from the species in , and apply one of several possible reconstruction methods. While many of these methods are statistically based, they are ultimately founded on underlying combinatorial uniqueness results concerning trees [7, 17].
Although phylogenetic trees have proved a convenient representation for many groups of species including, for example, mammals and birds, in other domains of life evolution is not always described as a simple vertical process of speciation (where lineages split in two as new species form) and extinction. Instead, various reticulate processes allow for a ‘horizontal’ component. Two main examples include the formation of hybrid species (such as in certain plant or fish species), and the exchange of genes between species in a process called lateral gene transfer (such as in bacteria). An additional reticulate process relevant to early life on earth is endosymbiosis in which organelles are incorporated into cells.
For these reasons, phylogenetic networks (acyclic directed graphs with a single root vertex and leaves forming the set ) have been proposed as a more flexible and accurate representation of evolutionary history [6, 15]. Accordingly, there has been considerable recent interest in extending the mathematical foundation of phylogenetic tree reconstruction to networks [11]. This extension faces a number of mathematical obstacles. In particular, while trees can be encoded and reconstructed in several ways (for example, based on their associated system of clusters, path distances between pairs of leaves, and induced -leaf subtrees), none of these approaches extends to networks, except for in very special cases [9, 12, 19]. This has led to various approaches being proposed, which usually involve one or more of the following:
- (i)
not distinguishing between phylogenetic networks that are similar in a certain way [16]; 2. (ii)
considering reconstruction only within a limited subclass of phylogenetic networks [2]; and 3. (iii)
allowing types of information for beyond what is normally used for tree reconstruction [1].
Approach (ii) has received the most attention so far, with some positive results (for example, for reconstructing the subclass of normal networks from their induced trees [20]). In this paper, we focus more on approach (iii), and, although we restrict to a class of subnetworks (which we call ‘orchard networks’), our reconstruction result has the additional strength that it can distinguish between any two networks from information on provided at least one of them is an orchard network. To provide some intuition, informally, a phylogenetic network is an orchard network if it can be reduced to a single vertex by recursively finding a pair of leaves that form either a cherry or a reticulated cherry, and then applying a cherry reduction to that pair of leaves.
The type of information on we consider is the following. View the interior (non-leaf) vertices of a phylogenetic network as being labelled. In the biological setting, this label could correspond, for example, to the genome of the ancestral species at this vertex (or some sub-genome that is sufficiently detailed to distinguish this ancestral vertex from others). For each species in the leaf set , suppose we can count the number of directed paths in the network from each ancestral genome (i.e. interior vertex) to . This ‘ancestral profile’ is thus an ordered tuple of numbers, one tuple for each leaf in (note that current technology does not yet provide this information, so our approach is in the spirit of earlier mathematical results in phylogenetics that preceded the data required for their application). It turns out that such information is not enough to distinguish between an arbitrary pair of networks (we provide an example). However, if the underlying network is an orchard network, our main result shows that no other network (orchard or not) can have the same ancestral profile. Moreover, we present and justify a polynomial-time algorithm for reconstructing any orchard network from its ancestral profile. Our arguments rely on a structural property of orchard networks which also implies that there is a polynomial-time algorithm for testing whether or not an arbitrary network is an orchard network.
Our results generalise earlier work in [4, 5] which considered the more restricted classes of ‘tree-sibling time-consistent’ networks and ‘tree-child’ networks, respectively. These authors use equivalent information on for reconstruction, however, their reconstruction result faces two limitations that are lifted here. First, the uniqueness results of [4, 5] hold only within the class of tree-sibling time-consistent networks and tree-child networks, whereas we show that ancestral profiles can distinguish an orchard network from any other network. Second, neither tree-sibling time-consistent networks nor tree-child networks can have too many reticulate vertices (at most and and , respectively, where ), whereas orchard networks can have arbitrarily many reticulate vertices (independent of ).
Our results are also related to (and partly motivated by) earlier work by [1] and [18] on ‘accumulation phylogenies’. This involved a different subclass of networks (called ‘regular’ in these papers, and ‘cluster networks’ in [11]), which neither contains, nor is contained in the subclass of orchard networks. A limitation of this subclass is that (unlike orchard networks) they do not allow ‘redundant arcs’ (an arc for which there is another path in the network from to ). Allowing redundant arcs has a strong biological motivation since even if each reticulation events happens instantaneously between two contemporaneous species, redundant arcs can still appear in the resulting network if not all species at the present are sampled. The results in [1, 18] also assume any two networks being considered are within this same subclass. In summary, our results are not directly related to this earlier work on accumulation phylogenies, apart from using a related type of information.
The paper is organised as follows. The next section contains some necessary definitions along with the statement of the main result (Theorem 2.2) and deduces, as a consequence, the main result (Theorem 1) in [5]. This section also provides examples to justify various claims. Section 3 describes some preliminary lemmas, which apply more generally than for ancestral profiles, and in Section 4 we state and prove the structural property of orchard networks that allows for an easy test as to whether or not an arbitrary network is of orchard type. The proof of Theorem 2.2 is established in Section 5. We end the paper with a brief discussion in Section 6.
Lastly, just as we completed the write-up of this paper, a manuscript [13] was posted on arXiv that also considers the class of orchard networks (referred to as “cherry-picking networks” in [13]). The focus of that manuscript is quite different to that of this paper; nevertheless, it contains an independent and different proof of the structural property of orchard networks which is needed as a lemma for Theorem 2.2 in this paper.
2. Main Result
Throughout the paper denotes a non-empty finite set and, unless otherwise stated, all paths are directed. For vertices and of a directed graph , we say is reachable from if there is a path in from to . Furthermore, for sets and , we denote the set obtained from by removing every element in that is also in by . If , say , we denote this by .
Phylogenetic networks. A phylogenetic network on is a rooted acyclic directed graph with no arcs in parallel and satisfying the following properties:
- (i)
the (unique) root has in-degree zero and out-degree two; 2. (ii)
a vertex with out-degree zero has in-degree one, and the set of vertices with out-degree zero is ; and 3. (iii)
all other vertices either have in-degree one and out-degree two, or in-degree two and out-degree one.
For technical reasons, if , we additionally allow a single vertex to be a phylogenetic network, in which case, the root is the vertex in . Phylogenetic networks as defined here are also referred to as ‘binary phylogenetic networks’ in the literature.
Let be a phylogenetic network on . The vertices with out-degree zero are the leaves of , and so is called the leaf set of . Furthermore, vertices with in-degree one and out-degree two are tree vertices, while vertices of in-degree two and out-degree one are reticulations. The arcs directed into a reticulation are called reticulation arcs, all other arcs are tree arcs. To illustrate, an example of a phylogenetic network with leaf set and three reticulations is shown in Fig. 1.
Lastly, let and be two phylogenetic networks on with vertex and arc sets and , and and , respectively. We say is isomorphic to if there exists a bijection such that for all , and if and only if for all .
Ancestral tuples and ancestral profile. Let be a phylogenetic network on with vertex set . Let be a fixed (arbitrary) labelling of the vertices in . For all , the ancestral tuple of , denoted , is the -tuple whose -th entry is the number of paths in from to . Denoted by , we call the set
[TABLE]
of ordered pairs the ancestral profile of . Furthermore, if is a phylogenetic network on and, up to an ordering of the non-leaf vertices of , we have , we say * realises *. Lastly, although depends on the ordering of the vertices in , the ordering is fixed and so the labelling can be effectively ignored.
Cherries and reticulated cherries. Let be a phylogenetic network on , and let be a -element subset of . Let and denote the parents of and , respectively. We say is a cherry of if . Furthermore, if one of the parents, say , is a reticulation and is an arc in , then is a reticulated cherry of , in which case, is the reticulation leaf of the reticulated cherry. Observe that is necessarily a tree vertex. For the phylogenetic network shown in Fig. 1, is a cherry, while is a reticulated cherry in which is the reticulation leaf. Furthermore, in Fig. 1, is neither a cherry nor a reticulated cherry.
We next describe two operations associated with cherries and reticulated cherries that are central to this paper. Let be a phylogenetic network. First suppose that is a cherry of . Then reducing is the operation of deleting and suppressing the resulting vertex of in-degree one and out-degree one. If the parent of and of is the root of , then reducing is the operation of deleting as well as deleting the root of , thus leaving only the isolated vertex . Now suppose that is a reticulated cherry of in which is the reticulation leaf. Then cutting is the operation of deleting the reticulation arc joining the parents of and , and suppressing the two resulting vertices of in-degree one and out-degree one. It is easily seen that the operations of reducing a cherry and cutting a reticulated cherry both result in a phylogenetic network. Collectively, we refer to these two operations as cherry reductions. To illustrate, the phylogenetic network shown in Fig. 2(i) (resp. Fig. 2(ii)) has been obtained from the phylogenetic network in Fig. 1 by reducing (resp. cutting ).
Orchard networks. For a phylogenetic network , the sequence
[TABLE]
of phylogenetic networks is a cherry-reduction sequence of if, for all , the phylogenetic network is obtained from by a (single) cherry reduction. The sequence is maximal if has no cherries or reticulated cherries. If consists of a single vertex, the sequence is complete, in which case, is called an orchard network. Observe that if (1) is complete, then the leaf set of has size two and the parent of each leaf is the root of . It is easily checked that the phylogenetic network shown in Fig. 1 is an orchard network. In Section 4, we show that if is an orchard network, then every maximal sequence of cherry reductions of an orchard network is complete. Thus if we want to construct a complete cherry-reduction sequence for an orchard network, the order in which the reductions are applied does not matter. In turn, this provides an easy test to decide whether or not an arbitrary network is orchard.
One of the most well-studied classes of phylogenetic networks is the class of tree-child networks. Introduced in [5], a phylogenetic network is tree-child if every non-leaf vertex is the parent of a tree vertex or a leaf. Tree-child networks are examples of orchard networks [3], but there exist orchard networks that are not tree-child. Indeed, while the size of the leaf set bounds the total number of vertices of a tree-child network [5], the total number of vertices in an orchard network is not necessarily bounded by the size of its leaf set. For example, the phylogenetic network shown in Fig. 3(i) is an orchard network with exactly three leaves but, by extending it in the obvious way, we can produce an orchard network with an arbitrarily large odd number of vertices and still with exactly three leaves. Furthermore, not all phylogenetic networks are orchard networks as Fig. 3(ii) illustrates.
For this paper, a second relevant class of phylogenetic networks is the class of tree-sibling time-consistent networks. Let be a phylogenetic network. We say is tree-sibling if every reticulation has a parent that is also the parent of a tree vertex or a leaf. Furthermore, is time-consistent if there is a map from the vertex set of to the non-negative integers such that if is a reticulation arc of , then ; otherwise, . We refer to such a mapping as a temporal labelling. In the literature, time-consistent networks are also referred to as temporal networks. Like tree-child networks, the class of tree-sibling time-consistent networks is a proper subclass of orchard networks. For completeness, we include a proof of containment. To see that it is proper, it is shown in [4] that, unlike orchard networks, the number of reticulations of a tree-sibling time-consistent network is bounded by the size of its leaf set.
Lemma 2.1**.**
Let be a tree-sibling time-consistent network. Then is an orchard network.
Proof.
Clearly, the lemma holds if has no reticulations. Therefore we may assume that has at least one reticulation. We first show that has either a cherry or a reticulated cherry. Let be a temporal labelling of the vertices of , and let be a reticulation with the property that for all reticulations of . Since is tree-sibling, has a parent, say, that is the parent of a vertex which is either a tree vertex or a leaf. By maximality, no reticulations are reachable from or . Therefore, if two leaves are reachable from either or , then has a cherry. If this does not occur, then is a leaf and that the (unique) child, say, of is also a leaf. In particular, is a reticulated cherry of .
To complete the proof, let be obtained from by a cherry reduction. Clearly, is also tree-sibling. Furthermore, it is easily checked that the mapping from the vertex set of to the non-negative integers given by is a temporal labelling of . Thus is tree-sibling time-consistent. The lemma now follows. ∎
Main result. The following theorem is the main result of the paper.
Theorem 2.2**.**
Let be an orchard network on with vertex set . Then, up to isomorphism, is the unique phylogenetic network on realising . Furthermore, up to isomorphism, can be reconstructed from in time .
It is worth emphasising that the uniqueness of in the statement of Theorem 2.2 is amongst all phylogenetic networks on , not just within the class of orchard networks on . Furthermore, if is not an orchard network, then the outcome of Theorem 2.2 does not necessarily hold. In particular, consider the two phylogenetic networks and in Fig. 4. It is easily checked that by fixing an ordering of the non-leaf vertices of each of and so that the parent of is in the same position in both orderings, we have . But is not isomorphic to .
Theorem 2.2 generalises results of Cardona et al. [4] and Cardona et al. [5]. Let be a phylogenetic network on with vertex set and let be a fixed ordering of the leaves in . For all , the path tuple of , denoted , is the -tuple whose -th entry is the number of paths in from to . Let denote the multiset
[TABLE]
of path tuples of . If is a phylogenetic network on and, up to an ordering of , we have , we say * realises *. The next theorem was established in [4] and [5].
Theorem 2.3**.**
Let be a phylogenetic network on .
- (i)
If is tree-sibling time-consistent, then, up to isomorphism, is the unique tree-sibling time-consistent network on realising . 2. (ii)
If is tree-child, then, up to isomorphism, is the unique tree-child network on realising .
Furthermore, for both instances, up to isomorphism, can be constructed from in time polynomial in the size of .
Let be a phylogenetic network on with vertex set . The set and multiset are equivalent in the amount of information they provide. To see this, let and be fixed orderings of the vertices in and , respectively. Then, for all , the -tuple is the tuple whose -th entry is the -th entry of for all . Similarly, each ordered pair in can be obtained from . Thus Theorem 2.2 generalises Theorem 2.3 in two ways. First, it shows that the latter holds for the more general class of orchard networks and, second, the uniqueness is not confined to the class of networks being constructed.
We end the section with three remarks. Firstly, Theorem 2.2 is not the first reconstruction result concerning the class of orchard networks. Although this class was not named, it is shown in [3] that orchard networks are reconstructible from their so-called multiset distance matrices. See [3, Theorem 3.4]. We have no doubt that, over time, the class of orchard networks will be realised to be reconstructible in other ways as well.
The second remark concerns a related, but weaker, notion to that of ancestral tuples called ancestral sets. Let be a phylogenetic network on with vertex set . For all , the ancestral set of is
[TABLE]
Thus is the set of non-leaf vertices in for which there is a directed path from to . Observe that, for all , the root of is always an element of and so is non-empty. Let denote the set
[TABLE]
of ordered pairs. Given , it is clear that we can construct in time .
To see that ancestral sets is a weaker notion than ancestral tuples, consider the two orchard networks and shown in Fig. 5, where the non-leaf vertices have been labelled . For each , the ancestral sets of , , and are , , and , respectively. But is not isomorphic to . Note that, for a fixed ordering of , the ancestral tuple of differs in and even though the ancestral tuples of and are the same for and . Nevertheless, despite this example, the ancestral sets of a phylogenetic network do provide some information regarding the structure of . As this is of possible independent interest, we highlight this in the next section where the preliminary lemmas are established in terms of ancestral sets.
The third remark concerns the relationship between orchard networks and the increasingly prominent class of tree-based networks [8]. A phylogenetic network on with root and vertex set is tree-based if it has, as a subgraph, a rooted subtree with root , vertex set , and leaf set . Note that in the subtree may have out-degree one. It is shown in [10] that the class of orchard networks is a proper subclass of tree-based networks. To see that it is proper, observe that the non-orchard networks and in Fig. 4 are both tree-based. Thus, the networks in this figure also show that Theorem 2.2 does not extend to tree-based networks.
3. Preliminary Lemmas
In this section, we establish several results that will be used in the proof of Theorem 2.2. These results show that the ancestral sets, and thus the ancestral tuples, of an arbitrary phylogenetic network recognise and distinguish cherries and reticulated cherries.
Lemma 3.1**.**
Let be a phylogenetic network on , and let and be distinct elements in . Then if and only if the parent of is reachable from the parent of .
Proof.
Let and denote the parents of and , respectively. If is reachable from , then it is clear that . To prove the converse, suppose that . Then and so, by definition, is reachable from . In turn, this implies that is reachable from . ∎
The next corollary immediately follows from Lemma 3.1 and the fact that phylogenetic networks are acyclic.
Corollary 3.2**.**
Let be a phylogenetic network on , and let be a -element subset of . Then is a cherry in if and only if .
Lemma 3.3**.**
Let be a phylogenetic network on , and let be a -element subset of . Then is a reticulated cherry of in which is the reticulation leaf if and only if
- (i)
\gamma(a){\color[rgb]{0,0,0}\subsetneq}\gamma(b), 2. (ii)
there is no such that , and 3. (iii)
.
Proof.
Let and denote the parents of and , respectively. It is easily checked that if is a reticulated cherry in which is the reticulation leaf, then (i)–(iii) hold. So suppose that (i)–(iii) hold. Since (i) holds, it follows by Lemma 3.1 that there is a directed path in from to . If is a tree vertex, then has a leaf, say, reachable from such that {\color[rgb]{0,0,0}c}\neq b. This implies that \gamma(a)\subset\gamma({\color[rgb]{0,0,0}c}), contradicting (ii). Therefore is a reticulation. Lastly, assume is not an arc in . Let denote the vertex on immediately prior to . If is a tree vertex, then has a leaf {\color[rgb]{0,0,0}c^{\prime}}\neq b reachable from with \gamma(a)\subset\gamma({\color[rgb]{0,0,0}c^{\prime}}), contradicting (ii). On the other hand, if is a reticulation, then
[TABLE]
contradicting (iii). Thus is an arc and so is a reticulated cherry in which is the reticulation leaf. ∎
4. Order Does Not Matter
Let be an orchard network. Then, by definition, there exists a complete cherry-reduction sequence for . But, how do we find such a sequence and does the order in which we apply the cherry reductions matter? The next proposition says that if we take and repeatedly apply cherry reductions until no more is possible, we always construct a complete cherry-reduction sequence. A vertex on a directed path is non-terminal if it is neither the first nor last vertex on the path.
Proposition 4.1**.**
Let be an orchard network, and let
[TABLE]
be a maximal sequence of cherry reductions. Then this sequence is complete.
Proof.
Let denote the leaf set of , and suppose (2) is not complete. Paralleling (2), we begin by constructing a sequence
[TABLE]
of rooted acyclic directed graphs as follows. If is obtained from by reducing a leaf of a cherry, then is obtained from by deleting the same leaf but not suppressing the resulting vertex of in-degree one and out-degree one. Similarly, if is obtained from by cutting a reticulated cherry, then is obtained from by deleting the same reticulation arc but not suppressing the two resulting vertices of in-degree one and out-degree one. More generally, if is obtained from by reducing a leaf of a cherry, that is, deleting a leaf say and suppressing its parent , then is obtained from by deleting as well as deleting every non-terminal vertex on the (unique) path from to in . Note that each of these non-terminal vertices has in-degree one and out-degree one in . On the other hand, if is obtained from by cutting a reticulated cherry, that is, deleting a reticulation arc and suppressing and , then is obtained from by deleting . Observe that, for all , if we suppress every vertex in of in-degree one and out-degree one, we obtain . Thus is a subdivision of for all , that is, can be obtained from by suppressing all vertices of in-degree one and out-degree one for all . Furthermore, as (2) is not complete, the root of is never deleted and so, for all , the root of is also and has out-degree two in .
We now analyse . Since (2) is maximal and not complete, has at least one reticulation. This implies that has at least one vertex of in-degree two and out-degree one. We next show that every non-terminal vertex in on a path from to a vertex of in-degree two and out-degree one has degree three.
4.1.1**.**
Let be a vertex of in-degree two and out-degree one in . If is a non-terminal vertex of on a path in from to , then has degree three in .
Proof.
Suppose is a vertex of in-degree one and out-degree one on a path from to in . In , the vertex has degree three. Therefore, for some , we have that is obtained from by a cherry reduction in which an arc incident with is deleted. Now, as is a vertex of in-degree two and out-degree one in , it follows that is a reticulation in , and therefore a reticulation in . Thus there is a path in from to . It is now easily checked that no cherry reduction applied to in which an arc incident with and not lying on is deleted is possible. Hence has degree-three. ∎
We now complete the proof of the proposition. Since is orchard, there is a sequence
[TABLE]
of cherry reductions such that consists of a single vertex. Let be the smallest index such that is obtained from by cutting a reticulated cherry in which the deleted reticulation arc, say, has the property that is in and it has in-degree two and out-degree one in . Observe that, by the choice of , no vertex of in-degree two and out-degree one is reachable from in except itself. As (2) is maximal, this implies that there is a unique vertex, say, in that is reachable from in .
Now, is a tree vertex in whose other child, in addition to , is a leaf. By (4.1.1), has degree-three in . Furthermore, as is a tree vertex in , it follows that has in-degree one and out-degree two in . Let denote the child of in that is not . At least one vertex in is reachable from in and this vertex is not . If, in , there is no vertex reachable from with in-degree two and out-degree one, then (2) is not maximal. Therefore, in there is such a vertex reachable from . In , the vertex is a reticulation, and so there is a such that is obtained from by cutting a reticulated cherry in which a reticulation arc directed into is deleted. Since is the reticulation arc directed into that is deleted, it follows . But, by the choice of , we have ; a contradiction. We conclude that (2) is complete. ∎
The following corollary is an immediate consequence of Proposition 4.1.
Corollary 4.2**.**
Let be an orchard network, and let be a cherry or a reticulated cherry of . If is obtained from by reducing if is a cherry or cutting if is a reticulated cherry, then is an orchard network.
Since deciding if a given pair of leaves of a phylogenetic network is either a cherry or a reticulated cherry takes constant time and a cherry reduction also takes constant time, the last corollary gives a polynomial-time algorithm for deciding if an arbitrary phylogenetic network is orchard. In particular, repeatedly find a cherry or a reticulated cherry, and apply the appropriate cherry reduction until this process is no longer possible. This takes at most iterations, where is the vertex of . If at the completion of this process, we have a phylogenetic network consisting of a single vertex, then is orchard; otherwise, is not orchard. Observe that if is orchard with leaves and reticulations, then this process consists of cherry reductions.
5. Proof of Theorem 2.2
In this section, we prove Theorem 2.2. For a phylogenetic network , Corollary 3.2 and Lemma 3.3 show that it is straightforward to recognise cherries and reticulated cherries of using only the ancestral sets, and thus the ancestral tuples, of . This fact is freely used throughout this section. We next describe two operations on tuples that parallel the operations of reducing a cherry and cutting a reticulated cherry.
Let be a non-empty finite set and, for some fixed , let
[TABLE]
be a set of ordered pairs, where, for all , we have that is a -tuple whose entries are either non-negative integers or . Note that the symbol is going to be used as a placeholder. Let be a -element subset of . The first operation will be used only in association with reducing when is a cherry. Let such that , but for all . Let be the set of ordered pairs obtained from as follows. For all , set so that the -th entry is
[TABLE]
Set . We say that has been obtained from by reducing .
The second operation will be used only in association with cutting when is a reticulated cherry in which is the reticulation leaf. Let such that but for all , and let such that but for all . Let be the set of ordered pairs obtained from as follows. For all , set so that the -th entry is
[TABLE]
and set so that the -th entry is
[TABLE]
Set . We say that has been obtained from by cutting .
Lemma 5.1**.**
Let be a phylogenetic network on with vertex set and , and fix an ordering of . Let be a -element subset of .
- (i)
*If is a cherry of , then, up to *entries with symbol , the set of ordered pairs obtained from by reducing is the ancestral profile of the phylogenetic network obtained from by reducing . 2. (ii)
*If is a reticulated cherry of in which is the reticulation leaf, then, up to *entries with symbol , the set of ordered pairs obtained from by cutting is the ancestral profile of the phylogenetic network obtained from by cutting .
Proof.
We prove the lemma for (ii). The proof of the lemma for (i) is similar, but easier, and omitted. Suppose is a reticulated cherry of in which is the reticulation leaf, and is obtained from by cutting . Let be the set of ordered pairs obtained from by cutting . We will show that is the ancestral profile of a phylogenetic network isomorphic to .
Let denote the vertex set of , and fix an ordering of the vertices in . Let and denote the parents of and , respectively, in . Set
[TABLE]
and
[TABLE]
Observe that and are both non-empty as and , but is empty.
Now consider . To obtain from , we chose (i) an entry in , say , such that but for all , and (ii) an entry in , say , such that but for all . In particular, these chosen entries correspond to vertices, and say, in and , respectively.
Let denote the phylogenetic network obtained from by bijectively relabelling the vertices in with the vertices in so that is relabelled , and bijectively relabelling the vertices in with the vertices in so that is relabelled . Clearly, is isomorphic to and is the ancestral profile of . Furthermore, it is easily checked that, up to isomorphism, is the ancestral profile of the phylogenetic network obtained from by cutting . But is isomorphic to , thereby completing the proof of the lemma. ∎
With Lemma 5.1 in hand, we next prove the uniqueness part of Theorem 2.2
Proof of the uniqueness part of Theorem 2.2..
The proof is by induction on the sum of the number of leaves and the number of reticulations in . If , then and , and consists of the single vertex in , and so uniqueness holds. If , then, as is orchard, and , in which case, consists of two leaves attached to the root. Again, uniqueness holds. Now suppose that and the uniqueness holds for all orchard networks for which the sum of the number of leaves and the number of reticulations is at most . Note that, as is orchard, .
Since is orchard, it has either a cherry or a reticulated cherry. Thus, by Corollary 3.2 and Lemma 3.3, it is possible to find a -element subset of using only such that is either a cherry or a reticulated cherry of . If the latter, we can also determine from which of and is the reticulation. Without loss of generality, we may assume is the reticulation leaf. Depending on whether is a cherry or a reticulated cherry, let be obtained from by reducing or cutting , respectively, and let be the set of ordered pairs obtained from by reducing or cutting , respectively. Regardless of the way and are obtained, it follows by Corollary 4.2 and Lemma 5.1 that is an orchard network and, up to isomorphism, is the ancestral profile of . Furthermore, has either leaves and reticulations if is a cherry, or leaves and reticulations if is a reticulated cherry. Therefore, by the induction assumption, up to isomorphism, is the unique phylogenetic network whose ancestral profile is .
Now let be a phylogenetic network on such that is the ancestral profile of . Note that has the same number of non-leaf vertices as , but not necessarily the same number of reticulations. First assume is a cherry of . Then, by Corollary 3.2, is a cherry of . Let denote the phylogenetic network obtained from by reducing . By Lemma 5.1(i), up to isomorphism, is the ancestral profile of . Thus, by the induction assumption, is isomorphic to . Since is a cherry of and , it follows that is isomorphic to .
Lastly, assume is a reticulated cherry of . Then, by Lemma 3.3, is a reticulated cherry of in which is the reticulation leaf. Let be the phylogenetic network obtained from by cutting . By Lemma 5.1(ii), up to isomorphism, is the ancestral profile of . Hence, by the induction assumption, is isomorphic to . As is a reticulated cherry of and in which is the reticulation leaf, we have that is isomorphic to . This completes the proof of the uniqueness part of Theorem 2.2. ∎
5.1. The algorithm
Let be an orchard network on , and let denote the ancestral profile of . Called Orchard Tuple, we next describe an algorithm which takes as its input and , and returns a phylogenetic network on that is isomorphic to . The proof that the algorithm works correctly is essentially the same as that used to prove the uniqueness part of Theorem 2.2, and so it is omitted. The running time of the algorithm follows its description.
If , then return the phylogenetic network consisting of the single vertex in . 2. 2.
Else, find a -element subset, say, of such that either (I) or (II) , there is no with , and
[TABLE]
- (a)
If satisfies (I) (in which case is a cherry), then
- (i)
Reduce in to give the set of ordered pairs. 2. (ii)
Apply Orchard Tuple to input and . Construct from the returned phylogenetic network on by subdividing the arc incident to with a new vertex , and adjoining a new leaf via the new arc . If , then set to be the phylogenetic network consisting of the leaves and adjoined to the root. Return . 2. (b)
Else, satisfies (II) (in which case is a reticulated cherry and is the reticulation leaf).
- (i)
Cut in to give the set of ordered pairs. 2. (ii)
Apply Orchard Tuple to and . Construct from the returned phylogenetic network on by subdividing the arcs incident to and with new vertices and , respectively, and adding the new arc . Return .
We now consider the running time of Orchard Tuple. The input to the algorithm is a set and the ancestral profile of an orchard network on whose entries are either a non-negative integer or the symbol . Let denote the vertex set of . As noted earlier, the set can be determined from in time. This is a preprocessing step and it will have no effect on the theoretical running time. Except for when , in which case, Orchard Tuple runs in constant time, each iteration begins by finding a -element subset of satisfying either (I) or (II). This takes time as there are two-element subsets of and each subset takes time to decide if is satisfies either (I) or (II). Once such a -element is found, we construct . Regardless of the way is constructed, this takes time. When is returned, we augment to in constant time, and so each iteration takes time.
When we recurse, is the ancestral profile of an orchard network with either one less leaf or one less reticulation than an orchard network for which is the ancestral profile. Thus the total number of iterations is . We conclude that Orchard Tuple completes in time. This completes the proof of Theorem 2.2.
6. Conclusion
The main result of this paper, Theorem 2.2, shows that the ancestral profile of an orchard network on uniquely determines amongst all phylogenetic networks on . This generalises results in both [4] and [5], which considered tree-sibling time-consistent networks and tree-child networks (subclasses of orchard networks whose number of reticulations is at most linear in the number of leaves). Curiously, these later results have a different motivation compared to what motivated Theorem 2.2. There the motivation is to construct a distance measure (metric) on the classes of tree-sibling time-consistent networks and tree-child networks which is computable in polynomial time. Recalling that they considered the equivalent notion of path-tuples, for two tree-sibling time-consistent (resp. tree-child) networks and , the distance between and is the value
[TABLE]
where the symmetric difference and the cardinality operator refer to multisets. It is easily checked that this same measure extends to the class of orchard networks.
As noted in the introduction, our result does not relate to specific biological data that is readily available at present. However, a type of data that might provide ancestral profile information would be genomic fragments that follow lineage splitting and reticulation events, so that when a reticulation occurs, a trace of each fragment in the incoming lineage is preserved in (different regions of) the reticulate genome.
Lastly, we end with a question asked by one of the referees. For a given orchard network , is it possible to count the number of complete cherry-reduction sequences of ?
Acknowledgements
We thank the three anonymous referees for their careful reading of the paper and constructive comments.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Baroni, M. Steel, Accumulation phylogenies, Annals of Combinatorics 10 (2006) 19–30.
- 2[2] M. Bordewich, K.T. Huber, V. Moulton, C. Semple, Recovering normal networks from shortest inter-taxa distance information, Journal of Mathematical Biology 77 (2018) 571–594.
- 3[3] M. Bordewich, C. Semple, Determining phylogenetic networks from inter-taxa distances, Journal of Mathematical Biology 73 (2016) 283–303.
- 4[4] G. Cardona, M. Llabrés, F. Rosselló, G. Valiente, A distance metric for a class of tree-sibling phylogenetic networks 24 (2008) 1481–1488.
- 5[5] G. Cardona, F. Rosselló, G. Valiente, Comparison of tree-child phylogenetic networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics 6 (2009) 552–569.
- 6[6] W.F. Doolittle, Phylogenetic classification and the universal tree, Science 284 (1999) 2124–2128.
- 7[7] J. Felsenstein, Inferring Phylogenies, Sinauer Associates, Sunderland, MA, 2004.
- 8[8] A. R. Francis, M. Steel, Which phylogenetic networks are merely trees with additional arcs?, Systematic Biology 64 (2015) 768–777.
