Long-context Protein Language Modeling Using Bidirectional Mamba with   Shared Projection Layers

Yingheng Wang; Zichen Wang; Gil Sadeh; Luca Zancato; Alessandro; Achille; George Karypis; Huzefa Rangwala

arXiv:2411.08909·q-bio.BM·April 3, 2025

Long-context Protein Language Modeling Using Bidirectional Mamba with Shared Projection Layers

Yingheng Wang, Zichen Wang, Gil Sadeh, Luca Zancato, Alessandro, Achille, George Karypis, Huzefa Rangwala

PDF

Open Access 1 Repo

TL;DR

This paper introduces LC-PLM, a novel protein language model architecture based on structured state-space models, capable of handling longer protein sequences and biological interaction graphs, outperforming Transformer-based models on various tasks.

Contribution

The paper presents LC-PLM, an alternative to Transformer models for protein language modeling, with improved length extrapolation and biological context integration using structured state-space models.

Findings

01

LC-PLM outperforms ESM-2 on downstream tasks by up to 30%.

02

LC-PLM demonstrates better length extrapolation capabilities.

03

Incorporating PPI graphs enhances protein structure and function prediction.

Abstract

Self-supervised training of language models (LMs) has seen great success for protein sequences in learning meaningful representations and for generative drug design. Most protein LMs are based on the Transformer architecture trained on individual proteins with short context lengths. Such protein LMs cannot extrapolate to longer proteins and protein complexes well. They also fail to account for the underlying biological mechanisms carried out by biomolecular interactions and dynamics i.e., proteins often interact with other proteins, molecules, and pathways in complex biological systems. In this work, we propose LC-PLM based on an alternative protein LM architecture, BiMamba-S, built upon selective structured state-space models, to learn high-quality universal protein representations at the amino acid token level using masked language modeling. We also introduce its graph-contextual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amazon-science/LC-PLM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Genetics, Bioinformatics, and Biomedical Research

MethodsAttention Is All You Need · Absolute Position Encodings · Label Smoothing · Adam · Residual Connection · Softmax · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention