Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering

Luca Hagen; Johanna P. M\"uller; Weitong Zhang; Mengyun Qiao; Bernhard Kainz

arXiv:2605.18313·cs.CV·May 19, 2026

Wasserstein Equilibrium Decoding for Reliable Medical Visual Question Answering

Luca Hagen, Johanna P. M\"uller, Weitong Zhang, Mengyun Qiao, Bernhard Kainz

PDF

1 Repo

TL;DR

This paper introduces a Wasserstein-based semantic stopping criterion for game-theoretic decoding in medical visual question answering, improving accuracy and efficiency of small vision-language models.

Contribution

It extends game-theoretic decoding to vision-language models with a novel Wasserstein criterion that enhances semantic consensus and reduces unnecessary iterations.

Findings

01

Achieved +3.5% accuracy improvement on VQA-RAD with Qwen3-VL-2B.

02

Matched MedGemma-4B performance without domain-specific fine-tuning.

03

Reduced average convergence iterations by 20%, improving inference efficiency.

Abstract

Small vision-language models (2-8B) are well-suited for clin- ical deployment due to privacy constraints, limited connectivity, and low-latency requirements favouring on-device or on-premise inference. However, their limited capacity exacerbates the generation of plausible but incorrect outputs. We extend game-theoretic decoding, previously restricted to text-only, closed-ended NLP tasks, to vision-language mod- els for open-ended Medical VQA. We introduce a semantically aware Wasserstein stopping criterion that replaces lexical order matching, en- abling convergence based on semantic consensus among near-synonymous candidate answers and avoiding unnecessary iterations caused by clini- cally equivalent ranking swaps. On VQA-RAD and PathVQA, we ob- tain consistent, statistically significant improvements over greedy and discriminative baselines. On VQA-RAD, we improve Qwen3-VL-2B by +3.5…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luca-hagen/Wasserstein-BDG-medical-VQA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.