# A model-free method for genealogical inference without phasing and its application for topology weighting

**Authors:** Simon H Martin

PMC · DOI: 10.1093/genetics/iyaf181 · Genetics · 2025-09-08

## TL;DR

A new method called sticcs allows genealogical inference from unphased genotype data, offering insights into evolutionary biology and relatedness without requiring data phasing.

## Contribution

The novel sticcs method enables model-free genealogical inference from unphased data and improves topology weighting accuracy compared to existing tools.

## Key findings

- sticcs accurately infers ancestral recombination graphs (ARGs) for small sample sizes using unphased genotype data.
- Topology weights derived from sticcs outperform those from phased data using popular tools.
- The stacking procedure allows topology weighting on larger datasets by combining results from multiple data subsets.

## Abstract

Recent advances in methods to infer and analyze ancestral recombination graphs (ARGs) are providing powerful new insights in evolutionary biology and beyond. Existing inference approaches tend to be designed for use with fully phased datasets, and some rely on model assumptions about demography and recombination rate. Here I describe a simple model-free approach for genealogical inference along the genome from unphased genotype data called Sequential Tree Inference by Collecting Compatible Sites (sticcs). sticcs applies a heuristic algorithm based on the perfect phylogeny principle to reconstruct a local genealogy at each variant site in the genome, using a “collecting” procedure to import information from other nearby sites. Using simulations, I show that sticcs is accurate for ARG inference, but only when the sample size is small. However, I also describe how it can be applied for the purpose of topology weighting by “stacking” tree sequences inferred for multiple subsets of the data, removing the sample size restriction. Topology weights estimated in this way from unphased data tend to be more accurate than those computed with full ARGs inferred from perfectly phased data using several popular tools. The methods presented therefore have promise for analysis of relatedness and introgression in nonmodel species, including polyploids. The new methods are implemented in 2 Python packages, sticcs (for ARG inference) and twisst2 (for topology weighting using the stacking procedure), both of which interface with the tskit library for analysis of tree sequence objects.

## Full-text entities

- **Genes:** RGN (regucalcin) [NCBI Gene 9104] {aka GNL, HEL-S-41, RC, SMP30}, ABL2 (ABL proto-oncogene 2, non-receptor tyrosine kinase) [NCBI Gene 27] {aka ABLL, ARG}
- **Chemicals:** sticcstack (-)
- **Species:** Heliconius cydno chioneus (subspecies) [taxon 171915], Heliconius timareta thelxinoe (subspecies) [taxon 1410579], Danaus chrysippus (African queen, species) [taxon 151541], Heliconius numata (species) [taxon 33419], Heliconius melpomene amaryllis (subspecies) [taxon 248312], Heliconius timareta (species) [taxon 101932], Heliconius melpomene (common postman, species) [taxon 34740], Heliconius melpomene melpomene (subspecies) [taxon 171917], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12774849/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12774849/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12774849/full.md

---
Source: https://tomesphere.com/paper/PMC12774849