Binary Verification for Zero-Shot Vision
Rongbin Hu, Jeffrey Liu

TL;DR
This paper introduces a training-free binary verification workflow for zero-shot vision tasks using off-the-shelf vision-language models, improving performance across multiple tasks without additional training.
Contribution
The authors propose a novel, training-free workflow combining quantization and binarization steps to enhance zero-shot vision performance with existing models.
Findings
Quantization to MCQ significantly improves task accuracy.
True/False binarization further boosts performance.
The workflow generalizes across various vision tasks.
Abstract
We propose a training-free, binary verification workflow for zero-shot vision with off-the-shelf VLMs. It comprises two steps: (i) quantization, which turns the open-ended query into a multiple-choice question (MCQ) with a small, explicit list of unambiguous candidates; and (ii) binarization, which asks one True/False question per candidate and resolves deterministically: if exactly one is True, select it; otherwise, revert to an MCQ over the remaining plausible candidates. We evaluate the workflow on referring expression grounding (REC), spatial reasoning (Spatial-Map, Spatial-Grid, Spatial-Maze), and BLINK-Jigsaw. Relative to answering open-ended queries directly, quantization to MCQ yields large gains, and True/False binarization provides a consistent additional boost. Across all tasks, the same workflow produces significant improvements, indicating generality. We further integrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
