# Species Tree Branch Length Estimation despite Incomplete Lineage Sorting, Duplication, and Loss

**Authors:** Yasamin Tabatabaee, Chao Zhang, Shayesteh Arasti, Siavash Mirarab

PMC · DOI: 10.1093/gbe/evaf200 · 2025-11-26

## TL;DR

This paper introduces a new algorithm called CASTLES-Pro to estimate species tree branch lengths while accounting for gene duplication, loss, and incomplete lineage sorting.

## Contribution

CASTLES-Pro is the first method to accurately estimate species tree branch lengths from multi-copy gene families while addressing gene duplication, loss, and incomplete lineage sorting.

## Key findings

- CASTLES-Pro improves accuracy for single-copy gene trees and extends to multi-copy gene families.
- CASTLES-Pro reduces systematic bias in terminal branch length estimation compared to concatenation methods.
- CASTLES-Pro is robust to random horizontal gene transfer but accuracy decreases at high transfer levels.

## Abstract

Phylogenetic branch lengths are essential for many analyses, such as estimating divergence times, analyzing rate changes, and studying adaptation. However, true gene tree heterogeneity due to incomplete lineage sorting, gene duplication and loss, and horizontal gene transfer can complicate the estimation of species tree branch lengths. While several tools exist for estimating the topology of a species tree addressing various causes of gene tree discordance, much less attention has been paid to branch length estimation on multi-locus datasets. For single-copy gene trees, some methods are available that summarize gene tree branch lengths onto a species tree, including coalescent-based methods that account for heterogeneity due to incomplete lineage sorting. However, no such branch length estimation method exists for multi-copy gene family trees that have evolved with gene duplication and loss. To address this gap, we introduce the CASTLES-Pro algorithm for estimating species tree branch lengths while accounting for both gene duplication and loss and incomplete lineage sorting. CASTLES-Pro improves on the existing coalescent-based branch length estimation method CASTLES by increasing its accuracy for single-copy gene trees and extending it to handle multi-copy ones. Our simulation studies show that CASTLES-Pro is generally more accurate than alternatives, eliminating the systematic bias toward overestimating terminal branch lengths often observed when using concatenation. Moreover, while not theoretically designed for horizontal gene transfer, we show that CASTLES-Pro is relatively robust to random horizontal gene transfer, though its accuracy can degrade at the highest levels of horizontal gene transfer.

## Full-text entities

- **Genes:** GDL [NCBI Gene 727501]
- **Diseases:** ILS (MESH:D015456)
- **Chemicals:** CASTLES (-)
- **Species:** Apis mellifera (bee, species) [taxon 7460], Lasioglossum albipes (species) [taxon 88501]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12648238/full.md

---
Source: https://tomesphere.com/paper/PMC12648238