Improving Semantic Uncertainty Quantification in LVLMs with Semantic Gaussian Processes

Joseph Hoche; Andrei Bursuc; David Brellmann; Gilles Louppe; Pavel Izmailov; Angela Yao; Gianni Franchi

arXiv:2512.14177·cs.CV·March 31, 2026

Improving Semantic Uncertainty Quantification in LVLMs with Semantic Gaussian Processes

Joseph Hoche, Andrei Bursuc, David Brellmann, Gilles Louppe, Pavel Izmailov, Angela Yao, Gianni Franchi

PDF

TL;DR

This paper introduces SGPU, a Bayesian framework that improves semantic uncertainty estimation in LVLMs by analyzing the geometric structure of answer embeddings, avoiding fragile clustering methods.

Contribution

SGPU offers a novel spectral approach to quantify semantic uncertainty, outperforming existing clustering-based methods across multiple models and datasets.

Findings

01

SGPU achieves state-of-the-art calibration and discrimination metrics.

02

It transfers effectively across models and modalities.

03

It provides reliable uncertainty estimates without brittle clustering.

Abstract

Large Vision-Language Models (LVLMs) often produce plausible but unreliable outputs, making robust uncertainty estimation essential. Recent work on semantic uncertainty estimates relies on external models to cluster multiple sampled responses and measure their semantic consistency. However, these clustering methods are often fragile, highly sensitive to minor phrasing variations, and can incorrectly group or separate semantically similar answers, leading to unreliable uncertainty estimates. We propose Semantic Gaussian Process Uncertainty (SGPU), a Bayesian framework that quantifies semantic uncertainty by analyzing the geometric structure of answer embeddings, avoiding brittle clustering. SGPU maps generated answers into a dense semantic space, computes the Gram matrix of their embeddings, and summarizes their semantic configuration via the eigenspectrum. This spectral representation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.