Investigating LLM Capabilities on Long Context Comprehension for Medical Question Answering

Feras AlMannaa; Talia Tseriotou; Jenny Chim; Maria Liakata

arXiv:2510.18691·cs.CL·January 16, 2026

Investigating LLM Capabilities on Long Context Comprehension for Medical Question Answering

Feras AlMannaa, Talia Tseriotou, Jenny Chim, Maria Liakata

PDF

Open Access

TL;DR

This paper explores the capabilities and limitations of large language models in understanding long medical contexts for question answering, highlighting effects of model size, retrieval techniques, and reasoning challenges.

Contribution

It is the first comprehensive study on LLM long-context comprehension in medical QA, analyzing various models, datasets, and retrieval methods to identify strengths and limitations.

Findings

01

RAG improves medical long-context QA performance in some cases

02

Model size influences comprehension and memorization issues

03

Temporal reasoning remains challenging for current models

Abstract

This study is the first to investigate LLM comprehension capabilities over long-context (LC), clinically relevant medical Question Answering (QA) beyond MCQA. Our comprehensive approach considers a range of settings based on content inclusion of varying size and relevance, LLM models of different capabilities and a variety of datasets across task formulations. We reveal insights on model size effects and their limitations, underlying memorization issues and the benefits of reasoning models, while demonstrating the value and challenges of leveraging the full long patient's context. Importantly, we examine the effect of Retrieval Augmented Generation (RAG) on medical LC comprehension, showcasing best settings in single versus multi-document QA datasets. We shed light into some of the evaluation aspects using a multi-faceted approach uncovering common metric challenges. Our quantitative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Information Retrieval and Search Behavior