UTSA-NLP at ArchEHR-QA 2025: Improving EHR Question Answering via Self-Consistency Prompting
Sara Shields-Menard, Zach Reimers, Joshua Gardner, David Perry, Anthony Rios

TL;DR
This paper presents a system that improves clinical question answering from electronic health records by using large language models with self-consistency prompting, enhancing sentence relevance detection and response accuracy.
Contribution
The authors introduce a novel approach combining few-shot prompting, self-consistency, and thresholding to improve EHR question answering, demonstrating that smaller models can outperform larger ones in this task.
Findings
Smaller 8B models outperform larger 70B models in sentence relevance detection.
Self-consistency and thresholding improve the reliability of sentence classification.
Accurate sentence selection is crucial for high-quality EHR question answering.
Abstract
We describe our system for the ArchEHR-QA Shared Task on answering clinical questions using electronic health records (EHRs). Our approach uses large language models in two steps: first, to find sentences in the EHR relevant to a clinician's question, and second, to generate a short, citation-supported response based on those sentences. We use few-shot prompting, self-consistency, and thresholding to improve the sentence classification step to decide which sentences are essential. We compare several models and find that a smaller 8B model performs better than a larger 70B model for identifying relevant information. Our results show that accurate sentence selection is critical for generating high-quality responses and that self-consistency with thresholding helps make these decisions more reliable.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
