RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning
Congyun Jin, Ming Zhang, Xiaowei Ma, Li Yujiao, Yingbo Wang, Yabo Jia,, Yuliang Du, Tao Sun, Haowen Wang, Cong Fan, Jinjie Gu, Chenfei Chi, Xiangguo, Lv, Fangzhou Li, Wei Xue, Yiran Huang

TL;DR
RJUA-MedDQA introduces a challenging multimodal benchmark for medical document question answering and clinical reasoning, emphasizing complex interpretation, numerical reasoning, and clinical inference, supported by an efficient annotation method and extensive evaluations.
Contribution
The paper presents a new comprehensive benchmark for medical document understanding, along with the ESRA annotation method that improves efficiency and accuracy, and evaluates current LMMs' capabilities and limitations.
Findings
Existing LMMs have limited overall performance.
LMMs are more robust to low-quality images than LLMs.
Reasoning across text and images remains challenging.
Abstract
Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis. Although impressive results have been achieved, we find that existing benchmarks do not reflect the complexity of real medical reports and specialized in-depth reasoning capabilities. In this work, we introduced RJUA-MedDQA, a comprehensive benchmark in the field of medical specialization, which poses several challenges: comprehensively interpreting imgage content across diverse challenging layouts, possessing numerical reasoning ability to identify abnormal indicators and demonstrating clinical reasoning ability to provide statements of disease diagnosis, status and advice based on medical contexts. We carefully design the data generation pipeline and proposed the Efficient Structural Restoration Annotation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
