Unlocking Biological Workflows for Robust Protein-Text Question Answering: A Dual-Dimensional RAG Framework
Li Ding, Duanyu Feng, Chen Huang, Yangshuai Wang, Yang Li, Wenqiang Lei, See-Kiong Ng

TL;DR
This paper introduces 2D-ProteinRAG, a novel framework that enhances protein-text question answering by integrating biological workflows with dual-dimensional filtering, improving robustness and generalization to out-of-distribution proteins.
Contribution
The paper presents 2D-ProteinRAG, a new framework that enables LLMs to operate within biological research workflows and employs a dual-dimensional filtering strategy for better information extraction.
Findings
Achieves state-of-the-art performance on biological OOD benchmarks.
Outperforms fine-tuned baselines and other RAG methods.
Demonstrates robustness and scalability in real-world scenarios.
Abstract
Protein-Text Question Answering (QA) is crucial for interpreting biological sequences through natural language. The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) that efficiently leverages biological databases and facilitates reasoning offers a potent approach for it. However, constrained by the standard RAG pipeline, these models often rely on curated, static datasets instead of expert-proven biological workflows, lacking the fine-grained information processing and struggling to generalize to novel (OOD) proteins. To bridge this gap, we propose 2D-ProteinRAG, a novel framework that empowers LLMs to operate within the gold-standard biological research workflow (BLAST). To further extract high-quality information from noisy retrieval contexts, we introduce a dual-dimensional (2D) filtering strategy following the expert analytical paradigms.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
