Beyond Long Context: When Semantics Matter More than Tokens

Tarun Kumar Chawdhury; Jon D. Duke

arXiv:2510.25816·cs.CL·October 31, 2025

Beyond Long Context: When Semantics Matter More than Tokens

Tarun Kumar Chawdhury, Jon D. Duke

PDF

TL;DR

This paper introduces the CLEAR method, which enhances clinical question answering by combining entity-aware retrieval with efficient processing, significantly improving accuracy and reducing token usage especially on long EHR notes.

Contribution

The paper presents the CLEAR retrieval approach and a new evaluation platform, demonstrating improved performance and efficiency over traditional methods in clinical NLP tasks.

Findings

01

CLEAR outperforms embedding-based retrieval with higher F1 scores.

02

CLEAR uses over 70% fewer tokens than traditional methods.

03

Performance gains are most significant on very long clinical notes.

Abstract

Electronic Health Records (EHR) store clinical documentation as base64 encoded attachments in FHIR DocumentReference resources, which makes semantic question answering difficult. Traditional vector database methods often miss nuanced clinical relationships. The Clinical Entity Augmented Retrieval (CLEAR) method, introduced by Lopez et al. 2025, uses entity aware retrieval and achieved improved performance with an F1 score of 0.90 versus 0.86 for embedding based retrieval, while using over 70 percent fewer tokens. We developed a Clinical Notes QA Evaluation Platform to validate CLEAR against zero shot large context inference and traditional chunk based retrieval augmented generation. The platform was tested on 12 clinical notes ranging from 10,000 to 65,000 tokens representing realistic EHR content. CLEAR achieved a 58.3 percent win rate, an average semantic similarity of 0.878, and used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.