pLMFPPred: a novel approach for accurate prediction of functional peptides integrating embedding from pre-trained protein language model and imbalanced learning
Zebin Ma, Yonglin Zou, Xiaobin Huang, Wenjin Yan, Hao Xu, Jiexin Yang,, Ying Zhang, Jinqi Huang

TL;DR
pLMFPPred is a new AI-based tool that uses protein language model embeddings and advanced sampling techniques to accurately predict functional peptides, outperforming existing methods and aiding drug discovery.
Contribution
The paper introduces pLMFPPred, a novel predictive model combining protein language embeddings with imbalance handling techniques for functional peptide prediction.
Findings
Achieved high accuracy (0.974) and AUC (0.99) on independent test set.
Outperforms existing methods in predicting functional peptides.
Effective in identifying toxic peptides and reducing computational costs.
Abstract
Functional peptides have the potential to treat a variety of diseases. Their good therapeutic efficacy and low toxicity make them ideal therapeutic agents. Artificial intelligence-based computational strategies can help quickly identify new functional peptides from collections of protein sequences and discover their different functions.Using protein language model-based embeddings (ESM-2), we developed a tool called pLMFPPred (Protein Language Model-based Functional Peptide Predictor) for predicting functional peptides and identifying toxic peptides. We also introduced SMOTE-TOMEK data synthesis sampling and Shapley value-based feature selection techniques to relieve data imbalance issues and reduce computational costs. On a validated independent test set, pLMFPPred achieved accuracy, Area under the curve - Receiver Operating Characteristics, and F1-Score values of 0.974, 0.99, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · vaccines and immunoinformatics approaches · Biochemical and Structural Characterization
MethodsFeature Selection
