Yale-DM-Lab at ArchEHR-QA 2026: Deterministic Grounding and Multi-Pass Evidence Alignment for EHR Question Answering
Elyas Irankhah, Samah Fodeh

TL;DR
The Yale-DM-Lab system for ArchEHR-QA 2026 employs deterministic grounding and multi-pass evidence alignment, utilizing diverse models and ensemble voting to improve performance on patient-record question answering tasks.
Contribution
This work introduces a multi-model ensemble approach with evidence alignment techniques for improved EHR question answering performance.
Findings
Model diversity and ensemble voting enhance accuracy.
Using full clinician answer as context improves evidence alignment.
Alignment accuracy is mainly limited by reasoning capabilities.
Abstract
We describe the Yale-DM-Lab system for the ArchEHR-QA 2026 shared task. The task studies patient-authored questions about hospitalization records and contains four subtasks (ST): clinician-interpreted question reformulation, evidence sentence identification, answer generation, and evidence-answer alignment. ST1 uses a dual-model pipeline with Claude Sonnet 4 and GPT-4o to reformulate patient questions into clinician-interpreted questions. ST2-ST4 rely on Azure-hosted model ensembles (o3, GPT-5.2, GPT-5.1, and DeepSeek-R1) combined with few-shot prompting and voting strategies. Our experiments show three main findings. First, model diversity and ensemble voting consistently improve performance compared to single-model baselines. Second, the full clinician answer paragraph is provided as additional prompt context for evidence alignment. Third, results on the development set show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
