HELM-BERT: A Transformer for Medium-sized Peptide Property Prediction
Seungeon Lee, Takuto Koyama, Itsuki Maeda, Shigeyuki Matsumoto, and Yasushi Okuno

TL;DR
HELM-BERT is a novel transformer-based peptide language model trained on HELM notation, effectively capturing complex peptide structures and outperforming existing models in property prediction tasks.
Contribution
This work introduces HELM-BERT, the first encoder-based peptide language model utilizing HELM notation, enhancing the modeling of complex peptide structures over SMILES-based approaches.
Findings
HELM-BERT outperforms state-of-the-art SMILES-based models in peptide property prediction.
The model demonstrates improved data efficiency in modeling therapeutic peptides.
Explicit representation of monomer and topology in HELM enhances predictive accuracy.
Abstract
Therapeutic peptides have emerged as a pivotal modality in modern drug discovery, occupying a chemically and topologically rich space. While accurate prediction of their physicochemical properties is essential for accelerating peptide development, existing molecular language models rely on representations that fail to capture this complexity. Atom-level SMILES notation generates long token sequences and obscures cyclic topology, whereas amino-acid-level representations cannot encode the diverse chemical modifications central to modern peptide design. To bridge this representational gap, the Hierarchical Editing Language for Macromolecules (HELM) offers a unified framework enabling precise description of both monomer composition and connectivity, making it a promising foundation for peptide language modeling. Here, we propose HELM-BERT, the first encoder-based peptide language model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Chemical Synthesis and Analysis · Antimicrobial Peptides and Activities
