Long-context Protein Language Modeling Using Bidirectional Mamba with Shared Projection Layers
Yingheng Wang, Zichen Wang, Gil Sadeh, Luca Zancato, Alessandro, Achille, George Karypis, Huzefa Rangwala

TL;DR
This paper introduces LC-PLM, a novel protein language model architecture based on structured state-space models, capable of handling longer protein sequences and biological interaction graphs, outperforming Transformer-based models on various tasks.
Contribution
The paper presents LC-PLM, an alternative to Transformer models for protein language modeling, with improved length extrapolation and biological context integration using structured state-space models.
Findings
LC-PLM outperforms ESM-2 on downstream tasks by up to 30%.
LC-PLM demonstrates better length extrapolation capabilities.
Incorporating PPI graphs enhances protein structure and function prediction.
Abstract
Self-supervised training of language models (LMs) has seen great success for protein sequences in learning meaningful representations and for generative drug design. Most protein LMs are based on the Transformer architecture trained on individual proteins with short context lengths. Such protein LMs cannot extrapolate to longer proteins and protein complexes well. They also fail to account for the underlying biological mechanisms carried out by biomolecular interactions and dynamics i.e., proteins often interact with other proteins, molecules, and pathways in complex biological systems. In this work, we propose LC-PLM based on an alternative protein LM architecture, BiMamba-S, built upon selective structured state-space models, to learn high-quality universal protein representations at the amino acid token level using masked language modeling. We also introduce its graph-contextual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genetics, Bioinformatics, and Biomedical Research
MethodsAttention Is All You Need · Absolute Position Encodings · Label Smoothing · Adam · Residual Connection · Softmax · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention
