Binary Verification for Zero-Shot Vision

Rongbin Hu; Jeffrey Liu

arXiv:2511.10983·cs.CV·March 30, 2026

Binary Verification for Zero-Shot Vision

Rongbin Hu, Jeffrey Liu

PDF

TL;DR

This paper introduces a training-free binary verification workflow for zero-shot vision tasks using off-the-shelf vision-language models, improving performance across multiple tasks without additional training.

Contribution

The authors propose a novel, training-free workflow combining quantization and binarization steps to enhance zero-shot vision performance with existing models.

Findings

01

Quantization to MCQ significantly improves task accuracy.

02

True/False binarization further boosts performance.

03

The workflow generalizes across various vision tasks.

Abstract

We propose a training-free, binary verification workflow for zero-shot vision with off-the-shelf VLMs. It comprises two steps: (i) quantization, which turns the open-ended query into a multiple-choice question (MCQ) with a small, explicit list of unambiguous candidates; and (ii) binarization, which asks one True/False question per candidate and resolves deterministically: if exactly one is True, select it; otherwise, revert to an MCQ over the remaining plausible candidates. We evaluate the workflow on referring expression grounding (REC), spatial reasoning (Spatial-Map, Spatial-Grid, Spatial-Maze), and BLINK-Jigsaw. Relative to answering open-ended queries directly, quantization to MCQ yields large gains, and True/False binarization provides a consistent additional boost. Across all tasks, the same workflow produces significant improvements, indicating generality. We further integrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.