# Construction of Phylogenetic Relationships Based on 8-mer Spectra Distribution Characteristics of Vertebrate Whole Genome Sequences

**Authors:** Zhenhua Yang, Li Wang, Guojun Liu, Dongsheng Yu, Xiangjun Cui

PMC · DOI: 10.3390/genes17010039 · 2025-12-31

## TL;DR

This paper introduces a new method for understanding species evolution by analyzing 8-mer patterns in whole genome sequences.

## Contribution

A dual-feature strategy combining class-level and order-level phylogenetic features derived from 8-mer spectra is proposed.

## Key findings

- Class-level features capture macroevolutionary patterns and establish the phylogenetic backbone.
- Order-level features enable finer-resolution discrimination at the ordinal level.
- Validation across vertebrate genomes confirmed the effectiveness of the dual-feature strategy.

## Abstract

Background/Objectives: With advances in sequencing technology, whole genome sequences have become a valuable resource for deciphering species evolution. However, efficiently extracting phylogenetic information from such data remains a major challenge. Traditional multiple sequence alignment methods are computationally intensive and perform poorly for distantly related species, while k-mer analysis offers a new direction for efficiently capturing genomic composition and evolutionary signatures. Methods: Feature extraction based on 8-mer spectra from 16 XYi subsets. Results: This study found that the distribution characteristics of whole genome sequences 8-mer spectra are closely related to species evolution. Building on this, we developed a dual-feature strategy for genome-scale phylogenetics. The strategy incorporates two distinct feature types: (a) 186 class-level phylogenetic features (comprising 93 for separability and 93 for conservatism), identified from 8-mer spectrum distributions of 16 XYi subsets, which capture macroevolutionary patterns; and (b) order-level phylogenetic features, designated as rank information, which are generated by ranking all 65,536 8-mers by frequency based on the CGi subset’s long-tail distribution and thereby capture microevolutionary patterns. Validation across vertebrate genomes confirmed that the class-level features establish the phylogenetic backbone, whereas the order-level features enable finer-resolution discrimination at the ordinal level. Conclusions: This study proposes a new method for constructing phylogenetic relationships at the genomic level.

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12841270/full.md

---
Source: https://tomesphere.com/paper/PMC12841270