# Algorithms to reconstruct past indels: The deletion-only parsimony problem

**Authors:** Jordan Moutet, Eric Rivals, Fabio Pardi

PMC · DOI: 10.1371/journal.pcbi.1012585 · PLOS Computational Biology · 2025-07-28

## TL;DR

This paper introduces efficient algorithms to reconstruct ancient DNA or protein sequences by focusing on deletion events, improving methods for understanding evolutionary history.

## Contribution

The paper presents the first exact polynomial-time algorithm for deletion-only ancestral sequence reconstruction and provides a mathematical foundation for graph-based representations.

## Key findings

- An exact algorithm is developed to find all optimal deletion-only reconstructions efficiently.
- A graph-based representation of all optimal reconstructions for a fixed node is computable in polynomial time.
- The deletion-only case is shown to be relevant for understanding general indel reconstruction problems.

## Abstract

Ancestral sequence reconstruction is an important task in bioinformatics, with applications ranging from protein engineering to the study of genome evolution. When sequences can only undergo substitutions, optimal reconstructions can be efficiently computed using well-known algorithms. However, accounting for indels in ancestral reconstructions is much harder. First, for biologically-relevant problem formulations, no polynomial-time exact algorithms are available. Second, multiple reconstructions are often equally parsimonious or likely, making it crucial to correctly display uncertainty in the results. Here, we consider a parsimony approach where only deletions are allowed, while addressing the aforementioned limitations. First, we describe an exact algorithm to obtain all the optimal solutions. The algorithm runs in polynomial time if only one solution is sought. Second, we show that all possible optimal reconstructions for a fixed node can be represented using a graph computable in polynomial time. While previous studies have proposed graph-based representations of ancestral reconstructions, this result is the first to offer a solid mathematical justification for this approach. Finally we provide arguments for the relevance of the deletion-only case for the general case.

An exciting frontier in evolutionary biology is the ability to reconstruct DNA or protein sequences from species that lived in the distant past. By analyzing sequences from present-day species, we aim to infer the sequences of their common ancestors—a process known as ancestral sequence reconstruction. This task has far-reaching applications, such as resurrecting ancient proteins and studying the biology of extinct organisms. However, a significant challenge remains: the lack of well-established methods for inferring past deletions and insertions—mutations that remove or add segments of genetic code. In this paper, we present algorithms that lay the groundwork for addressing this gap. We show that finding the reconstructions involving only deletion events, while minimizing their number, can be done efficiently. Additionally, we show that all optimal solutions can be represented using specialized graphs. While previous studies have proposed graph-based representations of ancestral reconstructions, we are the first to provide a rigorous mathematical foundation for the use of these graphs.

## Full-text entities

- **Genes:** DSPP (dentin sialophosphoprotein) [NCBI Gene 1834] {aka DFNA39, DGI1, DMP3, DPP, DSP}
- **Chemicals:** Au (MESH:D006046), amino acid (MESH:D000596), TopDown (-), nucleotide (MESH:D009711)
- **Mutations:** phenylalanine/tyrosine, deletion at site 6, A for X, A for T

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12331173/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12331173/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12331173/full.md

---
Source: https://tomesphere.com/paper/PMC12331173