Test-Time Consistency in Vision Language Models

Shih-Han Chou; Shivam Chandhok; James J. Little; Leonid Sigal

arXiv:2506.22395·cs.CV·June 30, 2025

Test-Time Consistency in Vision Language Models

Shih-Han Chou, Shivam Chandhok, James J. Little, Leonid Sigal

PDF

Open Access

TL;DR

This paper introduces a simple, post-hoc test-time framework to improve the semantic consistency of vision-language models without retraining, significantly enhancing their reliability across equivalent inputs.

Contribution

It proposes a novel, model-agnostic, test-time consistency method using two objectives, applicable to any VLM, to improve semantic consistency without supervised re-training.

Findings

01

Significant improvements in consistency on MM-R3 benchmark.

02

Applicable to various state-of-the-art VLMs without retraining.

03

Establishes a new inference-time adaptation approach for multimodal models.

Abstract

Vision-Language Models (VLMs) have achieved impressive performance across a wide range of multimodal tasks, yet they often exhibit inconsistent behavior when faced with semantically equivalent inputs, undermining their reliability and robustness. Recent benchmarks, such as MM-R3, highlight that even state-of-the-art VLMs can produce divergent predictions across semantically equivalent inputs, despite maintaining high average accuracy. Prior work addresses this issue by modifying model architectures or conducting large-scale fine-tuning on curated datasets. In contrast, we propose a simple and effective test-time consistency framework that enhances semantic consistency without supervised re-training. Our method is entirely post-hoc, model-agnostic, and applicable to any VLM with access to its weights. Given a single test point, we enforce consistent predictions via two complementary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling