TL;DR
PPI2Text is a multimodal language model that generates detailed, free-text descriptions of protein-protein interactions from amino acid sequences, enhancing interpretability and integration with biological literature.
Contribution
It introduces a novel coordinate-aligned pair map decoding method and releases a large dataset for PPI captioning, advancing multimodal biological language modeling.
Findings
PPI2Text outperforms baselines on linguistic metrics.
It achieves higher factuality scores compared to synthesized references.
The model effectively captures interaction details across residue pairs.
Abstract
Protein-protein interaction (PPI) modeling has been widely studied as a binary or multi-label classification task. While emerging multimodal large language models (LLMs) can now describe single proteins, they remain unable to generate free-form descriptions of interactions between protein pairs. Moving beyond controlled vocabulary annotations, we propose to model PPI using free-text description, enabling richer expressiveness, improved interpretability, and better integration with literature knowledge base. We present PPI2Text, a multimodal LLM for free-form PPI captioning from amino acid sequences, that encodes each protein using ESM3 encoder, constructs a pair map from the two representations to capture interactions across all residue pairs, and autoregressively generates descriptions using a Qwen3 language decoder. We further introduce PaCo-RoPE, a coordinate-aligned positional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
