BIT.UA-AAUBS at ArchEHR-QA 2026: Evaluating Open-Source and Proprietary LLMs via Prompting in Low-Resource QA

Richard A. A. Jonker; Alexander Christiansen; Alexandros Maniatis; R\'uben Garrido; Rog\'erio Braunschweiger de Freitas Lima; Roman Jurowetzki; S\'ergio Matos

arXiv:2605.03618·cs.CL·May 6, 2026

BIT.UA-AAUBS at ArchEHR-QA 2026: Evaluating Open-Source and Proprietary LLMs via Prompting in Low-Resource QA

Richard A. A. Jonker, Alexander Christiansen, Alexandros Maniatis, R\'uben Garrido, Rog\'erio Braunschweiger de Freitas Lima, Roman Jurowetzki, S\'ergio Matos

PDF

1 Repo

TL;DR

This paper evaluates open-source and proprietary large language models for clinical question answering in low-resource settings without training data, using prompt engineering and ensembling techniques, achieving top competition results.

Contribution

It demonstrates the effectiveness of prompt-based methods and ensembling for clinical QA with LLMs, highlighting open-source models' competitiveness in healthcare tasks.

Findings

01

Proprietary models are resilient to prompt variations.

02

Open-source models like MedGemma 3 27B perform well with proper prompts.

03

Prompt engineering and ensembling improve model robustness.

Abstract

This paper presents the joint participation of the BIT.UA and AAUBS groups in the ArchEHR-QA 2026 shared task, which focuses on clinical question answering and evidence grounding in a low-resource setting. Due to the absence of training data and the strict data privacy constraints inherent to the healthcare domain (e.g. GDPR), we investigate the capabilities of Large Language Models (LLMs) without weight updates. We evaluate several state-of-the-art proprietary models and locally deployable open-source alternatives using various prompt engineering strategies, including task decomposition, Chain-of-Thought, and in-context learning. Furthermore, we explore majority voting and LLM-as-a-judge ensembling techniques to maximize predictive robustness. Our results demonstrate that while proprietary models exhibit strong resilience to prompt variations, domain-adapted open-source models (such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bioinformatics-ua/ArchEHR-QA-2026
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.