# Multi-feature fusion for gene prediction and functional peptide identification

**Authors:** Chenjing Ma, Qianran Wei, Guohua Wang, Yan Miao, Lei Yuan

PMC · DOI: 10.3389/fmicb.2026.1736391 · Frontiers in Microbiology · 2026-02-06

## TL;DR

This paper introduces GP2FI, a new computational tool that improves the prediction of functional peptides like anticancer and antimicrobial peptides by combining gene prediction and sequence analysis.

## Contribution

The novel GP2FI framework integrates multi-feature fusion with a two-stage model for enhanced gene prediction and functional peptide identification.

## Key findings

- MHA-preconv and FuncPred-CB outperform existing methods in accuracy and performance metrics on benchmark datasets.
- GP2FI effectively captures both local and long-range sequence features using CNNs and Transformer layers.
- The integration of pre-trained BERT in FuncPred-CB enhances contextual feature extraction from amino acid sequences.

## Abstract

Anticancer peptides (ACPs) have demonstrated potent antitumor activity and low toxicity, offering considerable potential in cancer therapeutics. Meanwhile, antimicrobial peptides (AMPs)serve as key components of the innate immune defense system. Owing to their broad-spectrum antimicrobial activity and low propensity for inducing resistance, AMPs have attracted considerable attention in the fields of infection control and immunotherapy. Accurate identification of ACPs and AMPs is critical for the discovery of novel therapeutic agents. However, wet-lab identification is often time-consuming, costly, and inefficient, falling short of the demands for highthroughput drug screening. Furthermore, existing computational methods exhibit limitations in feature representation and cross-task prediction capability. To address these challenges, a tool for functional peptide prediction is proposed, namely GP2FI, which consists of two sequential stages: a gene prediction model (MHA-preconv) and a functional peptide identification model (FuncPred-CB). MHA-preconv integrates CNNs with Transformer encoder layers to form a two-stage deep architecture, effectively capturing both local sequence patterns and long-range dependencies. Based on the coding regions identified by MHA-preconv, FuncPred-CB incorporates a pre-trained BERT language model to automatically extract contextual semantic features from amino acid sequences. Experimental results on multiple benchmark datasets demonstrate that MHA-preconv and GP2FI consistently outperforms the state-of-the-art methods in terms of accuracy and other performance metrics.The code for the GP2FI can be found at https://github.com/ma999-mxl/maLBX.git.

## Linked entities

- **Diseases:** cancer (MONDO:0004992)

## Full-text entities

- **Diseases:** Cancer (MESH:D009369), ACPs (MESH:C565529), rare genetic diseases (MESH:D035583), infection (MESH:D007239), ACP (MESH:C562856), toxicity (MESH:D064420)
- **Chemicals:** 5-methylcytosine (MESH:D044503), dipeptide (MESH:D004151), AMP (MESH:D000089882), MHA (MESH:C069357), tryptophan (MESH:D014364), ACP (-), CB (MESH:C063451), amino acid (MESH:D000596)
- **Species:** Natronomonas pharaonis (species) [taxon 2257], Staphylococcus (genus) [taxon 1279]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12920568/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12920568/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12920568/full.md

---
Source: https://tomesphere.com/paper/PMC12920568