PepHarmony: A Multi-View Contrastive Learning Framework for Integrated Sequence and Structure-Based Peptide Encoding
Ruochi Zhang, Haoran Wu, Chang Liu, Huaping Li, Yuqian Wu, Kewei Li,, Yifan Wang, Yifan Deng, Jiahui Chen, Fengfeng Zhou, Xin Gao

TL;DR
PepHarmony is a novel multi-view contrastive learning framework that effectively integrates sequence and structure information for peptide encoding, advancing peptide representation for drug discovery and engineering.
Contribution
It introduces a new contrastive learning approach that combines sequence and structure data into a unified peptide encoding, addressing limitations of existing models.
Findings
Outperforms baseline and fine-tuned models in capturing peptide structure-sequence relationships.
Demonstrates robustness through extensive ablation studies emphasizing contrastive loss and data sorting.
Provides publicly available code for reproducibility and further research.
Abstract
Recent advances in protein language models have catalyzed significant progress in peptide sequence representation. Despite extensive exploration in this field, pre-trained models tailored for peptide-specific needs remain largely unaddressed due to the difficulty in capturing the complex and sometimes unstable structures of peptides. This study introduces a novel multi-view contrastive learning framework PepHarmony for the sequence-based peptide encoding task. PepHarmony innovatively combines both sequence- and structure-level information into a sequence-level encoding module through contrastive learning. We carefully select datasets from the Protein Data Bank (PDB) and AlphaFold database to encompass a broad spectrum of peptide sequences and structures. The experimental data highlights PepHarmony's exceptional capability in capturing the intricate relationship between peptide sequences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · vaccines and immunoinformatics approaches · Chemical Synthesis and Analysis
MethodsAlphaFold · Contrastive Learning
