Displacement-Optimized Tanglegrams for Trees and Networks

Daniel H Huson

PMC · DOI:10.1093/molbev/msag066·March 10, 2026

Displacement-Optimized Tanglegrams for Trees and Networks

Daniel H Huson

PDF

Open Access

TL;DR

This paper introduces a new method for visualizing phylogenetic trees and networks by minimizing taxon and network edge misalignment.

Contribution

DO-tanglegrams is a novel approach that optimizes layouts for both trees and networks by minimizing taxon displacement and reticulate displacement.

Findings

01

DO-tanglegrams outperformed existing methods like phytools::cophylo and NN-tanglegram on synthetic data.

02

The algorithm handles unresolved nodes and missing taxa effectively.

03

The method uses a combination of local search and simulated annealing to achieve optimization.

Abstract

Phylogenetic trees and networks play a central role in biology, bioinformatics, and mathematical biology, and producing clear, informative visualizations of them is an important task. Tanglegrams, which display two phylogenies side by side with lines connecting shared taxa, are widely used for comparing evolutionary histories, host–parasite associations, and horizontal gene transfer. Existing layout algorithms have largely focused on trees and on minimizing the number of intertaxon edge crossings. We introduce displacement-optimized tanglegrams (DO-tanglegrams), a new approach that applies equally to trees and rooted phylogenetic networks. Our method explicitly minimizes taxon displacement—the vertical misalignment of corresponding taxa across the two sides—and reticulate displacement—the vertical distance spanned by reticulation edges within a network. We formalize one-sided and…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals1

phytools

Diseases1

reticulate disp

Figures5

Click any figure to enlarge with its caption.

Example of a tanglegram. On the left, we show a published rooted phylogenetic network on scorpions (Blasco-Aróstegui et al. 2025a) computed from gene trees using PhyloFusion (Zhang et al. 2025), and on the right, we show a phylogenetic tree from another publication (Blasco-Aróstegui et al. 2025b).

Comparison on wasp data. a) Tanglegram obtained using the cophylo function in phytools, with 27 crossings and a taxon displacement of 42. This is the same layout as in (López-Vaamonde et al. 2001). b) Tanglegram obtained using the DO-tanglegram algorithm with NN presorting, with only two crossings and a taxon displacement of four.

Comparison of DO-tanglegram and cophylo on pairs of synthetic phylogenetic trees, with 50–450 taxa, under varying proportions of reticulations, missing taxa (m), and contracted edges (c). On the left, we compare taxon displacement and on the right, we compare number of crossings.

Comparison. a) Tanglegram obtained using the NN-tanglegram method implemented in Dendroscope. b) Tanglegram obtained using the new DO-tanglegram method. Note that a) contains two nested trees (highlighted edges, whereas the use of a backbone tree in the DO-tanglegram algorithm prevents this from happening in b).

Comparison of DO-tanglegram (SplitsTree) and NN-tanglegram (Dendroscope) on synthetic pairs of rooted phylogenetic networks, on 50–400 taxa, each with 10 reticulations, under varying proportions of missing taxa (m) and contracted edges (c). Left: taxon displacement. Right: total reticulate displacement.

Tables1

Table 1.. Feature comparison of tanglegram methods.

Method	One-sided	Two-sided	Min crossings	Min taxon disp.	Missing taxa	Unresolved nodes	Many-to-many	Networks	Min reticulate disp.	Visualization
DO-tanglegram algorithm (this article, SplitsTree)	$∙$	$∙$	×	$∙$	$∙$	$∙$	×	$∙$	$∙$	$∙$
NN-tanglegram (Scornavacca et al. 2011), Dendroscope	×	$∙$	°	×	$∙$	$∙$	×	$∙$	×	$∙$
phytools::cophylo (Revell 2024)	$∙$	$∙$	°	°	$∙$	$∙$	$∙$	×	×	$∙$
Shuffle & Untangle (Nguyen et al. 2022)	$∙$	$∙$	×	$∙$	×	$∙$	$∙$	×	×	×
dendextend::tanglegram (Galili 2015)	$∙$	$∙$	×	$∙$	°	$∙$	×	×	×	$∙$
Generalized binary tanglegrams (GBT) (Bansal et al. 2009)	$∙$	$∙$	$∙$	×	$∙$	×	$∙$	×	×	×
ape::cophyloplot (Paradis et al. 2004)	°	°	°	×	$∙$	$∙$	×	×	×	$∙$

Equations6

Funding1

—University of Tübingen10.13039/501100002345

Keywords

tanglegramphylogenetic treephylogenetic networkvisualizationdisplacement optimizationtaxon displacementreticulate displacement

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Genome Rearrangement Algorithms · Evolution and Paleontology Studies

Full text

Introduction

Tanglegrams, which display two rooted trees or networks side by side with lines connecting shared (or matched) taxa, are a widely used tool in phylogenetics and comparative biology. They were popularized in the context of reconciling host–parasite associations and comparing evolutionary histories (Page 1994; Charleston 1998), where minimizing the number of intertaxon edge crossings is a natural criterion for obtaining clear and interpretable drawings.

Formally, a tanglegram (for phylogenetic trees) consists of two rooted trees defined on overlapping sets of taxa, drawn facing each other, with lines or curves connecting leaves represented matching taxa, called intertaxon edges. The internal structure of each tree is drawn planarly, so any crossings in the figure occur only among the intertaxon edges. The tanglegram crossing minimization problem asks for leaf orders on the two sides (respecting the ancestor–descendant relationships of the trees) that minimize the number of intertaxon edge crossings. In the one-sided version, the leaf order of one tree is fixed and only the other tree may be permuted, while in the two-sided version, both trees may be reordered subject to their constraints.

In addition to minimizing crossings, an alternative objective for tree tanglegrams is to minimize the taxon displacement between the two leaf orders. This measure is defined as the sum of the absolute differences between the vertical positions of corresponding leaves, and in the drawing corresponds to minimizing the total vertical displacement of the intertaxon edges. Whereas crossing minimization emphasizes reducing visual clutter, taxon displacement emphasizes vertical alignment of matching taxa, and can yield layouts that better highlight similarities in the hierarchical structure of the two trees. This criterion was considered in Venkatachalam et al. (2010), where it is referred to as the Spearman footrule distance.

For trees, the one-sided crossing minimization problem can be solved using a polynomial $[eqn]$ algorithm (Fernau et al. 2010; Venkatachalam et al. 2010). The more general two-sided version, where both trees can be permuted, is NP-complete. Bansal et al. (2009) introduced generalized binary tanglegrams and provided algorithms and applications motivated by comparative genomics.

Scornavacca et al. (2011) extended the concept of tanglegrams from pairs of rooted phylogenetic trees to pairs that involve rooted phylogenetic networks. Their NN-tanglegram method (implemented in Dendroscope Huson and Scornavacca 2016) first computes a distance matrix H over the full taxon set that captures the topology of the two networks and then applies the Neighbor-Net algorithm to H to derive a leaf ordering used to draw the networks. They prove that this approach is guaranteed to produce a layout with zero intertaxon edge crossings whenever such an embedding exists. In practice, their implementation incorporates an additional heuristic step, not described in detail, that attempts to further reduce the number of intertaxon edge crossings, without regard for the layout of the reticulate edges. NN-tanglegrams are not restricted to backbone-based layouts (as introduced below).

When moving from trees to rooted phylogenetic networks, the presence of reticulation edges introduces additional layout challenges. In this context, an analog of taxon displacement is to minimize the reticulate displacement, defined as the sum of the vertical distances between the endpoints of reticulate edges in a left-to-right drawing. Whereas taxon displacement measures how well corresponding taxa align across the two sides of a tanglegram, reticulate displacement measures the vertical separation between the endpoints of reticulation edges within a single network. Minimizing this displacement improves the readability of network tanglegrams by reducing the visual distortion caused by reticulation edges with large vertical extent and thus provides a complementary optimization criterion alongside intertaxon edge crossing minimization.

We emphasize that taxon displacement is a between-phylogeny measure, comparing the vertical alignment of corresponding taxa across the two sides of a tanglegram. Reticulate displacement, in contrast, is a within-phylogeny measure that quantifies the vertical offset of reticulation edges inside each network.

Let $[eqn]$ and $[eqn]$ be two rooted phylogenetic networks (or trees) on taxon sets $[eqn]$ and $[eqn]$ . For the purposes of this article, a tanglegram consists of a left-to-right drawing of the first network $[eqn]$ and a right-to-left drawing of the second network $[eqn]$ , separated by a rectangular region that contains lines or curves connecting any two leaves $[eqn]$ in $[eqn]$ and $[eqn]$ in $[eqn]$ that are labeled by matching taxa, as illustrated in Fig. 1.

Example of a tanglegram. On the left, we show a published rooted phylogenetic network on scorpions (Blasco-Aróstegui et al. 2025a) computed from gene trees using PhyloFusion (Zhang et al. 2025), and on the right, we show a phylogenetic tree from another publication (Blasco-Aróstegui et al. 2025b).

In this work, we address the problem of computing a tanglegram for any two such networks, with the goal of obtaining a visually effective layout. The key idea is to explicitly minimize both the taxon displacement and the reticulate displacement of the two networks. We consider both the one-sided and two-sided variants of the tanglegram optimization problem, and we call the resulting visualization a displacement-optimized tanglegram (DO-tanglegram). The one-sided optimization problem is computationally hard in the case of a network (Huson 2025), and the two-sided problem is computationally hard even in the special case where both inputs are binary trees (Fernau et al. 2010). These hardness results motivate the development of efficient heuristics. Accordingly, we propose a practical DO-tanglegram algorithm. We show that it performs favorably compared to two state-of-the-art methods, namely the R function phytools::cophylo for rooted phylogenetic trees (Revell 2024) and the NN-tanglegram algorithm for rooted phylogenetic networks (Scornavacca et al. 2011), on both real and synthetic data.

Results

Several methods have been developed for drawing tanglegrams, ranging from early tools for host–parasite cophylogenies to recent heuristics for phylogenetic networks. These approaches differ in the optimization objectives they pursue (minimizing intertaxon edge crossings, minimizing vertical displacement, or heuristically balancing both), in the type of input they accept (binary vs. nonbinary trees, identical vs. differing taxon sets, or general networks), and in whether they provide an integrated visualization environment. Table 1 summarizes the capabilities of the most relevant methods, highlighting the variants of the problem addressed and which criteria are explicitly optimized.

We have implemented our DO-tanglegram heuristic in a new release of SplitsTree (Huson and Bryant 2024), which supports one- and two-sided operation and allows optimization of taxon and reticulate displacement within one or both phylogenies, with or without the use of the NN-presorting heuristic described below.

DO-tanglegrams on trees

To demonstrate performance on trees, we compared our DO-tanglegram implementation in SplitsTree with a recently updated and widely used method for trees, the cophylo function in the R package phytools (Revell 2024).

The documentation of cophylo includes an example dataset of fig wasps and their parasites (López-Vaamonde et al. 2001). Following the procedure described therein, one obtains a tanglegram with 27 crossings and a taxon displacement of 42, matching the layout reported in the original study. Applying our DO-tanglegram algorithm to the same two trees, we obtain a tanglegram with only 2 crossings and a taxon displacement of 4 (see Fig. 2).

Comparison on wasp data. a) Tanglegram obtained using the cophylo function in phytools, with 27 crossings and a taxon displacement of 42. This is the same layout as in (López-Vaamonde et al. 2001). b) Tanglegram obtained using the DO-tanglegram algorithm with NN presorting, with only two crossings and a taxon displacement of four.

We also performed a systematic comparison on synthetic phylogenetic trees. To generate evaluation datasets, we began with a large background tree on 500 taxa. For a given set of parameters $[eqn]$ , $[eqn]$ , $[eqn]$ , and $[eqn]$ , we first randomly extracted a subtree on n taxa from the background tree. Next, we created two input trees, $[eqn]$ and $[eqn]$ , by independently applying $[eqn]$ rooted subtree prune-and-regraft operations to the extracted tree. If $[eqn]$ , we randomly removed a proportion m of taxa from each tree. Similarly, if $[eqn]$ , we randomly contracted a proportion c of internal edges in each tree.

We constructed one pair of trees for each combination of $[eqn]$ , $[eqn]$ , $[eqn]$ , and $[eqn]$ . We ran both DO-tanglegram and cophylo (using the command cophylo(t1, t2, rotate=TRUE)) on every input pair and measured the total taxon displacement, number of crossings, and wall-clock runtime. On average, both methods required only a few seconds per dataset.

The results in Fig. 3 indicate that our new method produces tanglegrams with lower taxon displacement on $[eqn]$ of the datasets and with fewer intertaxon crossings on $[eqn]$ of the datasets, in comparison to cophylo.

Comparison of DO-tanglegram and cophylo on pairs of synthetic phylogenetic trees, with 50–450 taxa, under varying proportions of reticulations, missing taxa (m), and contracted edges (c). On the left, we compare taxon displacement and on the right, we compare number of crossings.

DO-tanglegrams on networks

The DO-tanglegram method is explicitly designed to work on rooted phylogenetic networks. Because it allows one to optimize both taxon displacement and reticulate displacement and also supports the computation of one-sided tanglegrams, it represents an improvement over the current state-of-the-art method, NN-tanglegram (Scornavacca et al. 2011) as implemented in Dendroscope, which only allows two-sided computations and focuses only on optimizing the intertaxon edge crossings.

In the paper introducing the NN-tanglegram method, the authors display a tanglegram between a rooted phylogenetic network and a rooted phylogenetic tree, based on data from (Kim and Donoghue 2008). In the NN-tanglegram layout, the taxon displacement is 128, number of crossings 90, and reticulate displacement 238. Using our DO-tanglegram heuristic on the same input, we obtain a tanglegram with taxon displacement 130, number of crossings 99, and reticulate displacement 93 (see Fig. 4).

Comparison. a) Tanglegram obtained using the NN-tanglegram method implemented in Dendroscope. b) Tanglegram obtained using the new DO-tanglegram method. Note that a) contains two nested trees (highlighted edges, whereas the use of a backbone tree in the DO-tanglegram algorithm prevents this from happening in b).

This example highlights an important conceptual difference between the two algorithms. In DO-tanglegrams, the leaves descending from a given parent or lowest stable ancestor (LSA)-parent (as defined below) always appear as a contiguous block in the layout; they are never interleaved with leaves from other subtrees. Tanglegrams produced using the NN-tanglegram method do not necessarily satisfy this property. In the example, the two leaves of the green subtree are separated by leaves belonging to the purple subtree. While such a placement may reduce displacement measures, it does not necessarily improve the clarity of the visualization.

We use synthetic phylogenetic networks to compare the performance of the new DO-tanglegram algorithm, as implemented in SplitsTree, with the NN-tanglegram algorithm implemented in Dendroscope. To generate these evaluation datasets, we began with a large background tree on 500 taxa. For a range of target sizes (50–400, in steps of 10) and for each choice of $[eqn]$ and $[eqn]$ , we extracted two identical subtrees of the specified size from this background tree. In each subtree, we then randomly and independently removed a proportion m of all taxa and contracted a proportion c of all internal edges, thus also considering datasets with missing taxa and unresolved nodes. Finally, in both trees, we randomly and independently added ten new edges to introduce reticulations. For these network experiments, we use smaller values of m and c, since higher levels of missing taxa or edge contraction, when combined with additional reticulate edges, produce overly complicated tanglegrams that are of limited practical relevance.

Figure 5 compares the resulting taxon displacement and total reticulate displacement for the two algorithms. With respect to taxon displacement, DO-tanglegram often performs better than NN-tanglegram, especially on datasets with missing taxa. DO-tanglegram achieves lower total reticulate displacement in over $[eqn]$ of all cases. We note that in nearly all cases where NN-tanglegram performs better in either plot, the datasets contain no missing taxa, suggesting that DO-tanglegram is clearly superior on incomplete datasets. The mean wall-clock time required on these samples was 25 s (35 s standard deviation) for DO-tanglegram and 102 s (262 s standard deviation) for NN-Tanglegram, suggesting a good speedup of the new method versus the old one. (The NN-tanglegram implementation crashed on ten input pairs; these were omitted from the analysis.)

Comparison of DO-tanglegram (SplitsTree) and NN-tanglegram (Dendroscope) on synthetic pairs of rooted phylogenetic networks, on 50–400 taxa, each with 10 reticulations, under varying proportions of missing taxa (m) and contracted edges (c). Left: taxon displacement. Right: total reticulate displacement.

Discussion

Tanglegrams are used in biological research for comparing phylogenetic trees. Moreover, there is much research focused on developing methods for computing phylogenetic networks, whose aim is to explicitly represent reticulate evolutionary processes such as speciation-by-hybridization, horizontal gene transfer, and reassortment. Hence, there is a need for versatile approaches capable of computing tanglegrams between phylogenies that may be trees or rooted networks (with both combining and transfer reticulations) and that may contain unresolved nodes and missing taxa.

In this work, we introduced the DO-tanglegram approach. The key idea is to jointly minimize both taxon displacement and reticulate displacement. Our results demonstrate that the method performs significantly better than state-of-the-art approaches—cophylo for trees and the NN-tanglegram algorithm in Dendroscope for networks. An implementation of the algorithm is provided in SplitsTree.

The NN-tanglegram method focuses exclusively on minimizing intertaxon crossings and can only be applied in a two-sided setting. In contrast, the DO-tanglegram algorithm is more general: it explicitly optimizes both taxon displacement and reticulate displacement, and it supports both one-sided and two-sided tanglegram computations.

As datasets grow larger and more complex, with increased taxon sampling, missing data, and higher levels of reticulation, interpreting relationships among phylogenies becomes more challenging. By explicitly minimizing both taxon and reticulate displacement, DO-tanglegrams aim to produce layouts that preserve structural correspondence and reduce visual confusion. Such improvements can facilitate comparative studies in areas including gene tree–species tree discordance, host–parasite coevolution, and the analysis of hybridization and horizontal gene transfer, where understanding relationships between multiple phylogenetic hypotheses is essential.

Although the DO-tanglegram method performs well on the examples considered here, several limitations should be acknowledged. The quality of a resulting tanglegram may depend on the extent of taxon overlap and the degree of discordance between the phylogenies under comparison. In cases of extreme structural divergence, no tanglegram representation may adequately convey all relevant evolutionary signal, and alternative visualizations—such as agreement subtrees, splits graphs, or reconciliation models—may be more appropriate. Furthermore, while the heuristic is efficient for the dataset sizes tested here, its performance on substantially larger or highly reticulated networks remains to be systematically tested.

Materials and methods

We adopt the same basic definitions as in (Huson et al. 2010; Huson 2025). Let X be a set of taxa. A rooted phylogenetic network on X is a tuple $[eqn]$ , where V is a finite set of nodes, $[eqn]$ is a set of directed edges, $[eqn]$ is a distinguished root, and $[eqn]$ assigns taxa to nodes, subject to the following conditions:

the directed graph $[eqn]$ is acyclic, ρ is the unique node of in-degree 0, λ is a bijection between X and the set of nodes with out-degree 0,no node has both in-degree 1 and out-degree 1, andeach leaf has in-degree at most 1.

A node $[eqn]$ is called a tree node if its in-degree is at most 1 and a reticulation node otherwise. An edge $[eqn]$ is referred to as a tree edge or a reticulation edge depending on whether w is a tree node or a reticulation node, respectively. To ensure practical relevance, we explicitly allow nodes to be multifurcating, with out-degree greater than two, and multicombining, with in-degree greater than two.

If there are no reticulate edges, then N is a rooted phylogenetic tree. In this case, N is planar and can be drawn in the plane without edge crossings. In contrast, if reticulate edges are present, N may be nonplanar, and any drawing of N may necessarily involve crossings.

We view rooted phylogenetic networks as a natural generalization of rooted phylogenetic trees. When drawing such a network, the goal is that tree edges never cross each other, while reticulate edges may cross tree edges and reticulate edges. To enhance visual clarity, one aims to reduce the number and extent of crossings, and in particular to avoid reticulate edges with large vertical extent in a left-to-right drawing. To this end, Huson (2025) introduced the concept of reticulate displacement and described an algorithm for obtaining network layouts that minimize this quantity.

In the present work, we extend these ideas from single-network layouts to the problem of computing tanglegrams for two rooted phylogenetic networks $[eqn]$ and $[eqn]$ , allowing nodes of arbitrary in-degree or out-degree, and allowing the two networks to be defined on different but overlapping taxon sets $[eqn]$ and $[eqn]$ .

The backbone tree and reticulate displacement

Let N be a phylogenetic network on taxon set X, with root ρ.

Recall that there are two distinct ways to interpret reticulate nodes in a rooted phylogenetic network (Huson et al. 2010; Huson 2025): in a combining view, a reticulate node represents a combining event, such as hybridization-by-speciation, where all incoming edges are treated equally and rendered in a similar fashion (see Fig. 2B of Huson 2025). In contrast, in a transfer view a reticulate node represents a transfer event, such as horizontal gene transfer. In this case, one incoming edge—the transferacceptor edge—represents the main lineage and is drawn like a regular tree edge, while all other incoming edges represent transferred material and are drawn as reticulate edges (see Fig. 1D of Huson 2025).

Whether a node v is represented as a combining event or as a transfer event depends on whether one of its incoming edges is designated as a transfer–acceptor edge. This designation may be determined by the software that generates the network and communicated via an appropriate convention in the network representation, or chosen interactively by the user in a network editor such as PhyloSketch (Huson 2025).

Let v be a reticulate node. The following operation converts v into a tree node. If v is associated with a transfer event, then one of its incoming edges e has been declared the transfer–acceptor edge, and we delete all other incoming edges. Otherwise, v is associated with a combining event. In this case, determine the LSA(v), defined as the node closest to v that lies on every directed path from the root ρ to v. Then replace all incoming edges of v by a single incoming edge from LSA(v).

Application of this operation to all reticulate nodes yields a tree with root ρ that has the same nodes as N, including all leaves. We call $[eqn]$ the backbone tree of N. Note that the backbone tree is not necessarily a proper phylogenetic tree, because it may contain unlabeled leaves and/or through nodes.

A basic left-to-right layout of the backbone tree B can be obtained by first assigning x-coordinate $[eqn]$ , where d is the number of edges on the path from the root ρ to node v. Then, we set the y-coordinate $[eqn]$ when ℓ is the kth leaf encountered in a postorder traversal of the tree, and for an internal node v we set $[eqn]$ , the average y-coordinate of all leaves ℓ below v. See (Huson 2025) for further elaboration.

The reticulate displacement of a left-to-right layout is defined as

[eqn]

where $[eqn]$ is the set of reticulate edges (i.e. combining or transfer edges). This measure depends solely on the y-coordinates assigned to nodes during the postorder traversal of the backbone tree B, which in turn depend on the ordering $[eqn]$ in which the children of each node v are visited. To reduce visual clutter and avoid reticulation edges that span large vertical distances in a drawing of N, one should use an ordering $[eqn]$ of B that minimizes the reticulate displacement (Huson 2025).

One-sided tanglegram layout

In the one-sided tanglegram problem, the layout of one rooted phylogeny (tree or network) is fixed, and we seek a suitable layout for the other. Let $[eqn]$ and $[eqn]$ be two rooted phylogenies (either can be a tree or network) on taxon sets $[eqn]$ and $[eqn]$ , respectively, and let $[eqn]$ and $[eqn]$ be their corresponding backbone trees.

Assume that the layout (ordering) $[eqn]$ of $[eqn]$ , and thus also the coordinates x and y of the nodes in $[eqn]$ , are fixed. Our goal is to determine a good ordering $[eqn]$ for $[eqn]$ , yielding coordinates x and y for $[eqn]$ .

A good ordering will attempt to minimize the reticulate displacement of $[eqn]$ and also the taxon displacement of the tanglegram, which we define as

[eqn]

where $[eqn]$ and $[eqn]$ are the leaves of $[eqn]$ and $[eqn]$ , respectively, that are labeled by taxon t.

Putting both together, we define the one-sided tanglegram score of a left-to-right layout of $[eqn]$ relative to $[eqn]$ as

[eqn]

where $[eqn]$ balance the avoidance of reticulation edges with large vertical extent (first term) against the vertical misalignment of corresponding taxa (second term). In our implementation, we restrict α and β to values in $[eqn]$ , optimizing one or both terms. A good one-sided layout for $[eqn]$ is then obtained by any ordering $[eqn]$ that minimizes this quantity.

In Huson (2025), we show that optimizing reticulate displacement is computationally hard, and introduce a heuristic based on a preorder traversal of $[eqn]$ : for each node v that is the LSA of some reticulate node, or is the source of a transfer edge, consider permutations of its children in $[eqn]$ so as to minimize the total reticulate displacement. If the number of children is at most eight, all permutations are evaluated exhaustively; otherwise, a heuristic search is performed by iteratively swapping pairs of children, using simulated annealing (Kirkpatrick et al. 1983) to escape local minima. In contrast, if $[eqn]$ , then the objective reduces to minimizing only the taxon displacement for the backbone tree (reticulation edges are ignored), which can be solved in polynomial time (Venkatachalam et al. 2010 ).

To address the combined optimization of reticulate and taxon displacement, we propose to modify this heuristic search in two ways. First, at the above-mentioned nodes, we use OSTS rather than RD as the objective. Second, we also process all other interior nodes in a similar fashion, using the taxon displacement TD as the local objective function.

We refer to this as the one-sided DO-tanglegram heuristic.

This heuristic has an obvious weakness. For example, if $[eqn]$ is a rooted star tree (a tree consisting only of a root and a set of leaves), then the ordering of the leaves is fully flexible and there exists an optimal layout with zero taxon displacement. However, if the out-degree of ρ exceeds the heuristic threshold beyond which only a subset of orderings is considered, then the optimal layout may be missed.

To mitigate this, we define the rank of $[eqn]$ as the y-coordinate of the associated leaf in $[eqn]$ . Using a postorder traversal, for each node u of the backbone tree $[eqn]$ , we determine the taxon $[eqn]$ of smallest rank that labels a leaf descendant of u. Now, when considering permutations of the children of some node v in the heuristic, we will always first consider the permutation obtained by ordering the children of v according to the smallest-rank taxa associated with their subtrees, leaving children without associated taxa fixed.

Accommodating missing taxa

Let $[eqn]$ denote the set of taxa shared by the two networks (or trees), and let $[eqn]$ . Since $[eqn]$ and $[eqn]$ may each contain additional taxa not present in the other, we compute taxon displacement based only on X, using ranks rather than raw y-coordinates.

In the one-sided problem, the ordering $[eqn]$ of the backbone tree $[eqn]$ is fixed. This determines a ranking $[eqn]$ of the shared taxa, obtained by traversing $[eqn]$ in postorder and recording the order in which the shared leaves are visited. For any candidate ordering $[eqn]$ of the backbone tree $[eqn]$ , we obtain a corresponding ranking $[eqn]$ in the same way.

We then define the taxon displacement between $[eqn]$ and $[eqn]$ as

[eqn]

Two-sided tanglegram layout

In the two-sided tanglegram layout problem, the layout of neither network is fixed, and both can be modified in order to obtain a good overall drawing. Let $[eqn]$ and $[eqn]$ be two rooted networks (or trees) on taxon sets $[eqn]$ and $[eqn]$ , respectively, and let $[eqn]$ and $[eqn]$ be their corresponding backbone trees.

The goal is to determine good orderings $[eqn]$ for $[eqn]$ and $[eqn]$ for $[eqn]$ , yielding coordinates x and y for all nodes in $[eqn]$ and $[eqn]$ .

We evaluate layouts using the two-sided tanglegram score, defined as

[eqn]

where the coefficients $[eqn]$ balance the competing goals of reducing reticulate displacement in each network (first and last terms) and reducing taxon displacement between the two sides (middle term). In our implementation, we restrict $[eqn]$ to values in $[eqn]$ , thereby optimizing one or more of these terms.

Minimizing $[eqn]$ on either network is computationally hard (Huson 2025). While minimizing $[eqn]$ alone can be done in polynomial time, minimizing the number of intertaxon edge crossings in the two-sided case is NP-complete, even when both $[eqn]$ and $[eqn]$ are binary trees (Fernau et al. 2010).

To address this problem heuristically, we apply the one-sided DO-tanglegram heuristic alternately to $[eqn]$ and $[eqn]$ : in each step we optimize the layout of one network while keeping the other fixed, then switch roles. Iterating this process yields a pair of orderings $[eqn]$ that together define a joint layout (Bansal et al. 2009). We refer to this as our two-sided DO-tanglegram heuristic.

Presorting using neighbor-net

In a presorting step, the NN-tanglegram method implemented in Dendroscope uses the neighbor-net algorithm (Bryant and Moulton 2004) to compute an initial circular ordering of the taxa present in both networks. It then applies a greedy heuristic that repeatedly swaps subnetworks in order to reduce the number of intertaxon crossings. We incorporate the same presorting strategy in our approach and refer to it as the NN-presorting heuristic.

Recall that every tree node v of a rooted phylogenetic network induces a nonempty cluster, namely the set of taxa labeling the leaf descendants of v, the so-called hardwired clusters (Huson et al. 2010). Let $[eqn]$ be the set of all hardwired clusters extracted from $[eqn]$ and $[eqn]$ , restricted to the set X of taxa common to both networks, and of size at least two. For every pair of taxa $[eqn]$ , we define their distance as

[eqn]

that is, the number of clusters in $[eqn]$ that contain exactly one of i and j.

We run the agglomerative phase of the neighbor-net algorithm on the distance matrix D to obtain a circular ordering $[eqn]$ of X, and then use this ordering to improve the initial layout of the backbone trees $[eqn]$ and $[eqn]$ . Let the rank of a taxon $[eqn]$ be its position in $[eqn]$ . For each of $[eqn]$ and $[eqn]$ , we perform a postorder traversal and, at every internal node v, sort its children by the increasing average rank of the taxa in their subtrees.

This presorting step aims to exploit the cluster structure of the two phylogenies in order to derive a taxon order that is as consistent as possible between the networks. If both inputs are trees and their cluster sets are fully compatible, this preprocessing guarantees a solution with zero intertaxon edge crossings (Scornavacca et al. 2011). In general, however, NN-presorting simply provides an informed initialization for the DO-tanglegram heuristic: in many cases it substantially reduces crossings and displacement, while in others it may lead to less favorable layouts due to the heuristic nature of the subsequent optimization.

Advanced settings

The heuristic search is executed in parallel using 32 jobs (the default), each initialized with a different random ordering of the children below every LSA node, in the case of a network, or below all children, in the case of a tree. This degree of parallelization adds little to the overall wall-clock time and gives rise to improved layouts.

In our heuristic search, the default parameters for simulated annealing are a start temperature of 1,000, an end temperature of 0.01, 1000 iterations per temperature step, and a cooling rate of 0.95. Although these settings are not exposed directly in the SplitsTree user interface, they—together with the default number of parallel jobs for network computations—can be adjusted through the application preferences, as described in the online SplitsTree user manual.

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bansal MS, Chang W-C, Eulenstein O, Fernández-Baca D. Generalized binary tanglegrams: Algorithms and applications. In: Sahni S, Rajasekaran S, editors. Bioinformatics and computational biology (BI Co B 2009), lecture notes in computer science. Vol. 5462. Springer; 2009. p. 114–125. 10.1007/978-3-642-00727-9_13. · doi ↗
2Blasco-Aróstegui J, Simone Y, Paulo OS, Prendini L. Mito-nuclear discordance reveals introgressive hybridization following vicariance and secondary contact in iberian scorpions (Buthidae: Buthus). BMC Ecol Evol. 2025 a:25:112. 10.1186/s 12862-025-02445-0.41126048 PMC 12548245 · doi ↗ · pubmed ↗
3Blasco-Aróstegui J, Simone Y, Prendini L. Systematic revision of the European species of Buthus leach, 1815 (scorpiones: Buthidae). Bull Am Mus Nat Hist. 2025 b:476:1–132. 10.1206/0003-0090.476.1.1. · doi ↗
4Bryant D, Moulton V. Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol Biol Evol. 2004:21:255–265. 10.1093/molbev/msh 018.14660700 · doi ↗ · pubmed ↗
5Charleston MA . Jungles: a new solution to the host/parasite phylogeny reconciliation problem. Math Biosci. 1998:149:191–223. 10.1016/s 0025-5564(97)10012-8.9621683 · doi ↗ · pubmed ↗
6Fernau H, Kaufmann M, Poths M. Comparing trees via crossing minimization. J Comput Syst Sci. 2010:76:593–608. 10.1016/j.jcss.2009.10.014. · doi ↗
7Galili T . dendextend: an R package for visualizing, adjusting, and comparing trees of hierarchical clustering. Bioinformatics. 2015:31:3718–3720. 10.1093/bioinformatics/btv 428.26209431 PMC 4817050 · doi ↗ · pubmed ↗
8Huson DH . Sketch, capture and layout phylogenies. P Lo S Comput Biol. 2025:21:e 1013805. 10.1371/journal.pcbi.1013805.41396979 PMC 12714275 · doi ↗ · pubmed ↗