When to Trust the Answer: Question-Aligned Semantic Nearest Neighbor Entropy for Safer Surgical VQA

Luca Carlini; Dennis Pierantozzi; Mauro Orazio Drago; Chiara Lena; Cesare Hassan; Elena De Momi; Danail Stoyanov; Sophia Bano; Mobarak I. Hoque

arXiv:2511.01458·cs.CV·April 24, 2026

When to Trust the Answer: Question-Aligned Semantic Nearest Neighbor Entropy for Safer Surgical VQA

Luca Carlini, Dennis Pierantozzi, Mauro Orazio Drago, Chiara Lena, Cesare Hassan, Elena De Momi, Danail Stoyanov, Sophia Bano, Mobarak I. Hoque

PDF

TL;DR

This paper introduces QA-SNNE, a novel uncertainty estimation method for surgical VQA that explicitly accounts for question-answer alignment, improving safety by reducing overconfidence in irrelevant answers.

Contribution

QA-SNNE incorporates question relevance into semantic entropy, enhancing uncertainty estimation in surgical VQA models, especially under language variation and out-of-template questions.

Findings

01

QA-SNNE improves AUROC by up to 21% on EndoVis18-VQA for some models.

02

QA-SNNE enhances robustness to question rephrasing in surgical VQA datasets.

03

The method is model-agnostic and effective in zero-shot and PEFT settings.

Abstract

Safety and reliability are critical for deploying visual question answering (VQA) systems in surgery, where incorrect or ambiguous responses can cause patient harm. A key limitation of existing uncertainty estimation methods, such as Semantic Nearest Neighbor Entropy (SNNE), is that they do not explicitly account for the conditioning question. As a result, they may assign high confidence to answers that are semantically consistent yet misaligned with the clinical question, especially under variation in question phrasing. We propose Question-Aligned Semantic Nearest Neighbor Entropy (QA-SNNE), a black-box uncertainty estimator that incorporates question-answer alignment into semantic entropy through bilateral gating. QA-SNNE measures uncertainty by weighting pairwise semantic similarities among sampled answers according to their relevance to the question, using embedding-based,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.