EvoLlama: Enhancing LLMs' Understanding of Proteins via Multimodal Structure and Sequence Representations
Nuowei Liu, Changzhi Sun, Tao Ji, Junfeng Tian, Jianxin Tang, Yuanbin, Wu, Man Lan

TL;DR
EvoLlama is a multimodal framework that integrates structural and sequential protein information into large language models, significantly improving their understanding and prediction capabilities for proteins.
Contribution
This work introduces EvoLlama, the first multimodal protein understanding model combining structure and sequence encoders with LLMs, enhancing performance over existing models.
Findings
Outperforms other protein-oriented LLMs in zero-shot settings by 1-8%.
Surpasses state-of-the-art baseline with supervised fine-tuning by 6%.
Achieves competitive results on protein property prediction datasets.
Abstract
Current Large Language Models (LLMs) for understanding proteins primarily treats amino acid sequences as a text modality. Meanwhile, Protein Language Models (PLMs), such as ESM-2, have learned massive sequential evolutionary knowledge from the universe of natural protein sequences. Furthermore, structure-based encoders like ProteinMPNN learn the structural information of proteins through Graph Neural Networks. However, whether the incorporation of protein encoders can enhance the protein understanding of LLMs has not been explored. To bridge this gap, we propose EvoLlama, a multimodal framework that connects a structure-based encoder, a sequence-based protein encoder and an LLM for protein understanding. EvoLlama consists of a ProteinMPNN structure encoder, an ESM-2 protein sequence encoder, a multimodal projector to align protein and text representations and a Llama-3 text decoder. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
MethodsALIGN
