# Assessment of phylogenetic informativeness in mitochondrial and nuclear genes for mammalian systematics using sparse learning

**Authors:** Carlos G. Schrago, Beatriz Mello

PMC · DOI: 10.3389/fbinf.2025.1704212 · 2026-01-08

## TL;DR

This study compares the usefulness of mitochondrial and nuclear genes for understanding mammalian evolutionary relationships using a new statistical method.

## Contribution

The first large-scale, quantitative comparison of phylogenetic information content in mammalian mitochondrial and nuclear genes using sparse learning.

## Key findings

- Mitochondrial genes like ND5, COX1, and CYTB contain the most phylogenetically informative sites.
- Nuclear genes on average have more informative sites, but mitochondrial genes also resolve species relationships well.
- Phylogenetic informativeness is positively correlated with gene length in both mitochondrial and nuclear markers.

## Abstract

Despite the growing availability of nuclear genomic data, mitochondrial genes remain the most widely used molecular markers in mammalian systematics. However, a quantitative assessment of the phylogenetic information content of mitochondrial loci compared to nuclear loci has never been carried out. Here, we apply a sparse learning approach based on Lasso regression to evaluate the contribution of alignment sites to phylogenetic likelihoods, providing the first estimates of phylogenetically effective lengths for markers commonly used in mammalian systematics. Analyzing more than 30,000 complete mammalian mitochondrial genomes and nuclear panels composed of either 100 randomly selected complete coding sequences or of partial gene segments from conventional markers, we examined phylogenetic informativeness at two taxonomic levels: within-species and among-species. On average, ∼32% of mitochondrial sites and ∼38% of nuclear sites were classified as phylogenetically informative. We found that the number of phylogenetically informative sites were positively correlated with total gene length. Therefore, longer mitochondrial genes, particularly ND5, COX1, and CYTB, harbored the largest numbers of informative sites. Although nuclear coding sequences contained, on average, more informative sites, mitochondrial genes also yielded consistent resolution of among-species relationships. Overall, our results provide the first large-scale, quantitative comparison of phylogenetic information content across mammalian mitochondrial and nuclear genes, offering a principled framework for marker selection in future systematics studies that can be broadly applied to any lineage.

## Linked entities

- **Genes:** ND5 (NADH dehydrogenase subunit 5) [NCBI Gene 4540], COX1 (cytochrome c oxidase subunit I) [NCBI Gene 4512], CYTB (cytochrome b) [NCBI Gene 4519]

## Full-text entities

- **Genes:** CYTB (cytochrome b) [NCBI Gene 4519] {aka MTCYB}, COX1 (cytochrome c oxidase subunit I) [NCBI Gene 4512] {aka COI, MTCO1}, ND5 (NADH dehydrogenase subunit 5) [NCBI Gene 4540] {aka MTND5}
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12824000/full.md

---
Source: https://tomesphere.com/paper/PMC12824000