Prot2Chat: Protein LLM with Early-Fusion of Text, Sequence and Structure
Zhicong Wang, Zicheng Ma, Ziqiang Cao, Changlong Zhou, Jun Zhang, Yiqin Gao

TL;DR
Prot2Chat introduces a novel framework that integrates protein sequence, structure, and text information early in the process using a large language model, enabling effective protein question-answering with improved performance and generalization.
Contribution
The paper presents a unified encoding method for protein data and text, along with an early-fusion approach using LLMs, which enhances protein Q&A capabilities and training efficiency.
Findings
Superior performance on two datasets in automated and expert evaluations.
Effective zero-shot prediction demonstrating strong generalization.
Efficient training through freezing encoders and using LoRA techniques.
Abstract
Motivation: Proteins are of great significance in living organisms. However, understanding their functions encounters numerous challenges, such as insufficient integration of multimodal information, a large number of training parameters, limited flexibility of classification-based methods, and the lack of systematic evaluation metrics for protein Q&A systems. To tackle these issues, we propose the Prot2Chat framework. Results: We modified ProteinMPNN to encode protein sequence and structural information in a unified way. We used a large language model (LLM) to encode questions into vectors and developed a protein-text adapter to compress protein information into virtual tokens based on these vectors, achieving the early fusion of text and protein information. Finally, the same LLM reads the virtual tokens and the questions to generate answers. To optimize training efficiency, we froze…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies
MethodsAdapter
