Understanding protein function with a multimodal retrieval-augmented foundation model

Timothy Fei Truong Jr; Tristan Bepler

arXiv:2508.04724·q-bio.QM·February 27, 2026

Understanding protein function with a multimodal retrieval-augmented foundation model

Timothy Fei Truong Jr, Tristan Bepler

PDF

TL;DR

PoET-2 is a multimodal, retrieval-augmented protein foundation model that improves protein function prediction, especially for variants with multiple mutations, by integrating evolutionary constraints and structure information.

Contribution

This work introduces PoET-2, a novel multimodal, retrieval-augmented model with hierarchical transformers and dual decoders, advancing protein function understanding and variant effect prediction.

Findings

01

Achieves state-of-the-art zero-shot variant effect prediction.

02

Outperforms previous methods in small dataset supervised learning.

03

Excels at scoring complex mutations including indels.

Abstract

Protein language models (PLMs) learn probability distributions over natural protein sequences. By learning from hundreds of millions of natural protein sequences, protein understanding and design capabilities emerge. Recent works have shown that scaling these models improves structure prediction, but does not seem to improve mutation understanding and representation quality for protein function prediction. We introduce PoET-2, a multimodal, retrieval-augmented protein foundation model that incorporates in-context learning of family-specific evolutionary constraints with optional structure conditioning to learn generative distributions over protein sequences. PoET-2 uses a hierarchical transformer encoder that is equivariant to sequence context ordering and a dual decoder architecture with both causal and masked language modeling objectives, allowing PoET-2 to operate in both fully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.