Zoom Consistency: A Free Confidence Signal in Multi-Step Visual Grounding Pipelines

Keon Kim; Krish Chelikavada

arXiv:2604.15376·cs.CV·April 20, 2026

Zoom Consistency: A Free Confidence Signal in Multi-Step Visual Grounding Pipelines

Keon Kim, Krish Chelikavada

PDF

1 Repo

TL;DR

This paper introduces zoom consistency, a geometric confidence signal in multi-step visual grounding pipelines, which correlates with prediction correctness and can improve model routing decisions.

Contribution

It demonstrates that zoom consistency is a useful, calibration-free confidence measure derived from intermediate outputs in multi-step visual grounding models.

Findings

01

Zoom consistency correlates with prediction correctness (AUC up to 0.60).

02

It can be used to route between models, improving accuracy by 0.8%.

03

The measure is a linear estimator of spatial error under ideal conditions.

Abstract

Multi-step zoom-in pipelines are widely used for GUI grounding, yet the intermediate predictions they produce are typically discarded after coordinate remapping. We observe that these intermediate outputs contain a useful confidence signal for free: zoom consistency, the distance between a model's step-2 prediction and the crop center. Unlike log-probabilities or token-level uncertainty, zoom consistency is a geometric quantity in a shared coordinate space, making it directly comparable across architecturally different VLMs without calibration. We prove this quantity is a linear estimator of step-1 spatial error under idealized conditions (perfect step-2, target within crop) and show it correlates with prediction correctness across two VLMs (AUC = 0.60; Spearman rho = -0.14, p < 10^{-6} for KV-Ground-8B; rho = -0.11, p = 0.0003 for Qwen3.5-27B). The correlation is small but consistent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

omxyz/zoom-consistency-routing
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.