TL;DR
This paper introduces Neural, a prompt optimization approach for evidence-grounded clinical question answering over electronic health records, achieving high accuracy without model fine-tuning.
Contribution
It proposes a decoupled evidence identification and answer synthesis method with automated prompt tuning, improving clinical QA performance efficiently.
Findings
Achieved 51.5 overall score, second place in BioNLP 2025 ArchEHR-QA.
Outperformed zero-shot and few-shot prompting by over 20 and 10 points.
Demonstrated data-driven prompt optimization as a cost-effective alternative to fine-tuning.
Abstract
Automated question answering (QA) over electronic health records (EHRs) can bridge critical information gaps for clinicians and patients, yet it demands both precise evidence retrieval and faithful answer generation under limited supervision. In this work, we present Neural, the runner-up in the BioNLP 2025 ArchEHR-QA shared task on evidence-grounded clinical QA. Our proposed method decouples the task into (1) sentence-level evidence identification and (2) answer synthesis with explicit citations. For each stage, we automatically explore the prompt space with DSPy's MIPROv2 optimizer, jointly tuning instructions and few-shot demonstrations on the development set. A self-consistency voting scheme further improves evidence recall without sacrificing precision. On the hidden test set, our method attains an overall score of 51.5, placing second stage while outperforming standard zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
