Investigating LLM Capabilities on Long Context Comprehension for Medical Question Answering
Feras AlMannaa, Talia Tseriotou, Jenny Chim, Maria Liakata

TL;DR
This paper explores the capabilities and limitations of large language models in understanding long medical contexts for question answering, highlighting effects of model size, retrieval techniques, and reasoning challenges.
Contribution
It is the first comprehensive study on LLM long-context comprehension in medical QA, analyzing various models, datasets, and retrieval methods to identify strengths and limitations.
Findings
RAG improves medical long-context QA performance in some cases
Model size influences comprehension and memorization issues
Temporal reasoning remains challenging for current models
Abstract
This study is the first to investigate LLM comprehension capabilities over long-context (LC), clinically relevant medical Question Answering (QA) beyond MCQA. Our comprehensive approach considers a range of settings based on content inclusion of varying size and relevance, LLM models of different capabilities and a variety of datasets across task formulations. We reveal insights on model size effects and their limitations, underlying memorization issues and the benefits of reasoning models, while demonstrating the value and challenges of leveraging the full long patient's context. Importantly, we examine the effect of Retrieval Augmented Generation (RAG) on medical LC comprehension, showcasing best settings in single versus multi-document QA datasets. We shed light into some of the evaluation aspects using a multi-faceted approach uncovering common metric challenges. Our quantitative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Information Retrieval and Search Behavior
