# TreeProfiler: large-scale metadata profiling along gene and species trees

**Authors:** Ziqi Deng, Claudia Sanchis-López, Ana Hernández-Plaza, Adrián A Davín, Jaime Huerta-Cepas

PMC · DOI: 10.1093/molbev/msag028 · Molecular Biology and Evolution · 2026-02-02

## TL;DR

TreeProfiler is a new tool that helps scientists analyze and visualize biological traits across large gene and species trees, enabling efficient profiling and discovery of evolutionary patterns.

## Contribution

TreeProfiler introduces a scalable and automated system for profiling traits along gene and species trees with robust summarization at internal nodes.

## Key findings

- TreeProfiler efficiently handles large datasets, demonstrated by analyzing over 400,000 sequences of a chemotaxis protein family.
- The tool successfully profiles bacterial and archaeal species abundance across 51 biomes.
- TreeProfiler supports ancestral character reconstruction and phylogenetic signal tests alongside genomic features.

## Abstract

Profiling biological traits along gene or species tree topologies is a well-established approach in comparative genomics, widely employed to infer gene function from co-evolutionary patterns (phylogenetic profiling), reconstruct ancestral states, and uncover ecological associations. However, existing profiling tools are typically tailored to specific use cases, have limited scalability for large datasets, and lack robust methods to aggregate or summarize traits at internal tree nodes. Here, we present TreeProfiler, a tool for automated annotation and interactive exploration of hundreds of features along large gene and species trees, with seamless summarization of mapped traits at internal nodes. TreeProfiler supports the profiling of custom continuous and discrete traits, as well as ancestral character reconstruction and phylogenetic signal tests. It also integrates commonly used genomic features, including multiple sequence alignments, protein domain architectures, and functional annotations. We demonstrate TreeProfiler's utility beyond traditional phylogenetic profiling, as well as its ability to efficiently handle massive datasets, by analyzing the functional diversification of the methyl-accepting chemotaxis protein family comprising over 400,000 genomic and metagenomic sequences and by profiling the relative abundance of 124,295 bacterial and archaeal species across 51 biomes. TreeProfiler is open-source and freely available at https://github.com/compgenomicslab/TreeProfiler.

## Full-text entities

- **Genes:** CD46 (CD46 molecule) [NCBI Gene 4179] {aka AHUS2, MCP, MIC10, TLX, TRA2.10}

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12926219/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12926219/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12926219/full.md

---
Source: https://tomesphere.com/paper/PMC12926219