# HFTC: a hierarchical fungal taxonomic classification model for ITS sequences using low-dimensional embedding features

**Authors:** Jiawei Wang, Shaojie Qiao, Dongsheng Xiang, Yangcheng Liao, Chao Wang

PMC · DOI: 10.3389/fgene.2025.1650244 · Frontiers in Genetics · 2025-10-03

## TL;DR

HFTC is a new model for classifying fungal species using ITS sequences, offering high accuracy and efficiency through a novel hierarchical approach.

## Contribution

HFTC introduces a hierarchical random forest model with bidirectional k-mers and low-dimensional embeddings for improved fungal classification.

## Key findings

- HFTC achieves 95.31% Matthews correlation coefficient (MCC) and 95.25% overall accuracy.
- It surpasses deep learning models like CNN-Duong by 3.2% in species-level hierarchical accuracy.
- HFTC shows a 1.60% discrepancy between overall and hierarchical accuracy, indicating strong hierarchical consistency.

## Abstract

Fungal identification through ITS sequencing is pivotal for biodiversity and ecological studies, yet existing methods often face challenges with high-dimensional features and inconsistent taxonomy predictions.

We proposed HFTC, a hierarchical fungal taxonomic classifier built upon a multi-level random forest (RF) architecture. Notably, HFTC incorporates a bidirectional k-mer strategy to capture contextual information from both sequence orientations. By leveraging Word2Vec embedding, it reduces feature dimensionality from 4
k
 to only 200, significantly improving computational efficiency while preserving rich sequence context.

Experimental results demonstrate that HFTC outperforms Mothur, RDP, Sintax, QIIME2, and CNN-Duong, achieving a Matthews correlation coefficient (MCC) of 95.31% despite uneven class distributions. Its overall accuracy (ACC) reaches 95.25%. At the species level, it attains a hierarchical accuracy (HA) of 95.10%, surpassing the best-performing deep learning baseline, CNN-Duong, by 3.2%. Moreover, HFTC exhibits the smallest discrepancy between ACC and HA (1.60%), in contrast to CNN-Duong, which shows the largest gap (35.00%), highlighting HFTC’s superior hierarchical consistency.

HFTC offers a scalable and accurate approach for fungal taxonomic classification. Its compact feature representation and hierarchical architecture make it particularly suitable for microbial diversity research. The source code and datasets are publicly accessible at https://github.com/wjjw0731/HFTC/tree/master.

## Full-text entities

- **Genes:** PQBP1 (polyglutamine binding protein 1) [NCBI Gene 10084] {aka MRX2, MRX55, MRXS3, MRXS8, NPW38, RENS1}, POLR2B (RNA polymerase II subunit B) [NCBI Gene 5431] {aka POL2RB, RPB2, hRPB140}
- **Diseases:** HFTC (MESH:D009181), ITS (MESH:D000082122)
- **Chemicals:** HFTC (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Cortinarius (genus) [taxon 34451], Cortinariaceae (family) [taxon 34450], Agaricales (common gilled mushrooms & allies, order) [taxon 5338]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12531816/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12531816/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/PMC12531816/full.md

---
Source: https://tomesphere.com/paper/PMC12531816