# Pepxml: ESM2-based extreme multilabel classification of pathogen-targeted antimicrobial peptides

**Authors:** Yannan Bin, Daijun Zhang, Zhiyang Hu, Chungui Xu, Yansen Su

PMC · DOI: 10.1093/bib/bbaf548 · Briefings in Bioinformatics · 2025-10-17

## TL;DR

PepXML is a new tool that uses advanced AI to predict which antimicrobial peptides target specific pathogens, helping to develop better peptide-based antibiotics.

## Contribution

PepXML introduces a novel ESM2-based model for extreme multilabel classification of pathogen-specific antimicrobial peptides.

## Key findings

- PepXML was validated using molecular docking and simulations to confirm peptide-pathogen interaction mechanisms.
- The model addresses data sparsity and label imbalance through clustering and hard negative sampling.
- A benchmark dataset of AMPs and pathogens was constructed for training and evaluation.

## Abstract

In recent years, antimicrobial peptides (AMPs) have attracted interest as potential peptide antibiotic due to their broad-spectrum antibacterial activity and high target specificity. However, existing research on AMP prediction mainly focuses on their functional properties, such as antibacterial, antiviral, and anticancer. This emphasis has created a significant gap in identifying AMPs that specifically target pathogens. Given the large variety of pathogens and the sparsity and imbalance of labels, it is challenging to determine which specific pathogens AMPs can effective against. To address this issue, we present PepXML, a large language model-based tool for extreme multilabel classification of pathogen-targeted AMPs. Our first step involved constructing a benchmark dataset of AMPs and their corresponding targeted pathogens, sourced from public databases. In PepXML, the peptides are embedded using ESM2. Further, clustering on a specifically designed label co-occurrence graph and hard negative sampling were employed to address challenges on data sparsity and label imbalance. To validate the reliability of our predictive results, we conducted molecular docking studies focused on peptide-bilayer membrane interactions and performed molecular dynamics simulations to elucidate the mechanisms of peptide-pathogen interactions. We anticipate that PepXML will be a valuable resource for advancing peptide-based therapeutics. The data and Python codes of the PepXML model are available at https://github.com/YannanBin/PepXML.git.

## Full-text entities

- **Chemicals:** ESM2 (-), AMP (MESH:D000089882)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12531984/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12531984/full.md

## References

70 references — full list in the complete paper: https://tomesphere.com/paper/PMC12531984/full.md

---
Source: https://tomesphere.com/paper/PMC12531984