BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions
Saptarshi Sengupta, Shuhua Yang, Paul Kwong Yu, Fali Wang, Suhang Wang

TL;DR
BioMol-MQA introduces a multi-modal question-answering dataset focused on polypharmacy, combining knowledge graphs, text, and molecular structures to evaluate LLM reasoning across diverse biomedical modalities.
Contribution
The paper presents a novel multi-modal QA dataset for bio-molecular interactions, highlighting the challenges LLMs face in reasoning over diverse biomedical data modalities.
Findings
Existing LLMs struggle with multi-modal reasoning in biomedical contexts.
The dataset reveals the need for advanced retrieval-augmented frameworks.
Strong background data improves LLM performance on complex questions.
Abstract
Retrieval augmented generation (RAG) has shown great power in improving Large Language Models (LLMs). However, most existing RAG-based LLMs are dedicated to retrieving single modality information, mainly text; while for many real-world problems, such as healthcare, information relevant to queries can manifest in various modalities such as knowledge graph, text (clinical notes), and complex molecular structure. Thus, being able to retrieve relevant multi-modality domain-specific information, and reason and synthesize diverse knowledge to generate an accurate response is important. To address the gap, we present BioMol-MQA, a new question-answering (QA) dataset on polypharmacy, which is composed of two parts (i) a multimodal knowledge graph (KG) with text and molecular structure for information retrieval; and (ii) challenging questions that designed to test LLM capabilities in retrieving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Advanced Graph Neural Networks
MethodsDropout · BERT · BART · RAG
