Prot2Text-V2: Protein Function Prediction with Multimodal Contrastive Alignment
Xiao Fei, Michail Chatzianastasis, Sarah Almeida Carneiro, Hadi Abdine, Lawrence P. Petalidis, Michalis Vazirgiannis

TL;DR
Prot2Text-V2 is a multimodal model that generates natural language descriptions of protein functions directly from amino acid sequences, using contrastive alignment and instruction fine-tuning to improve accuracy and generalization.
Contribution
It introduces a novel contrastive alignment method and a sequence-to-text framework for protein function prediction without relying on structured ontologies.
Findings
Outperforms traditional methods in low-homology conditions
Effectively generates accurate natural language protein descriptions
Demonstrates strong cross-modal alignment between sequences and text
Abstract
Predicting protein function from sequence is a central challenge in computational biology. While existing methods rely heavily on structured ontologies or similarity-based techniques, they often lack the flexibility to express structure-free functional descriptions and novel biological functions. In this work, we introduce Prot2Text-V2, a novel multimodal sequence-to-text model that generates free-form natural language descriptions of protein function directly from amino acid sequences. Our method combines a protein language model as a sequence encoder (ESM-3B) and a decoder-only language model (LLaMA-3.1-8B-Instruct) through a lightweight nonlinear modality projector. A key innovation is our Hybrid Sequence-level Contrastive Alignment Learning (H-SCALE), which improves cross-modal learning by matching mean- and std-pooled protein embeddings with text representations via contrastive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Bioinformatics · Biomedical Text Mining and Ontologies · RNA and protein synthesis mechanisms
