ProtT3: Protein-to-Text Generation for Text-based Protein Understanding

Zhiyuan Liu; An Zhang; Hao Fei; Enzhi Zhang; Xiang Wang; Kenji; Kawaguchi; Tat-Seng Chua

arXiv:2405.12564·q-bio.QM·May 22, 2024·3 cites

ProtT3: Protein-to-Text Generation for Text-based Protein Understanding

Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji, Kawaguchi, Tat-Seng Chua

PDF

Open Access 1 Repo 1 Video

TL;DR

ProtT3 introduces a novel framework combining protein language models and language models to enable effective protein-to-text generation, advancing text-based protein understanding and establishing new benchmarks.

Contribution

It presents a new method for protein-to-text generation by integrating PLMs with LMs via a cross-modal projector, addressing a largely unexplored area.

Findings

01

ProtT3 significantly outperforms existing baselines.

02

Ablation studies confirm the importance of core components.

03

Established comprehensive benchmarks for protein-to-text tasks.

Abstract

Language Models (LMs) excel in understanding textual descriptions of proteins, as evident in biomedical question-answering tasks. However, their capability falters with raw protein data, such as amino acid sequences, due to a deficit in pretraining on such data. Conversely, Protein Language Models (PLMs) can understand and convert protein data into high-quality representations, but struggle to process texts. To address their limitations, we introduce ProtT3, a framework for Protein-to-Text Generation for Text-based Protein Understanding. ProtT3 empowers an LM to understand protein sequences of amino acids by incorporating a PLM as its protein understanding module, enabling effective protein-to-text generation. This collaboration between PLM and LM is facilitated by a cross-modal projector (i.e., Q-Former) that bridges the modality gap between the PLM's representation space and the LM's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

acharkq/prott3
pytorchOfficial

Videos

ProtT3: Protein-to-Text Generation for Text-based Protein Understanding· underline

Taxonomy

TopicsAdvanced Text Analysis Techniques · Biomedical Text Mining and Ontologies · Topic Modeling