SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring

Hector G. Rodriguez; Marcus Rohrbach

arXiv:2604.25855·cs.CV·May 15, 2026

SIEVES: Selective Prediction Generalizes through Visual Evidence Scoring

Hector G. Rodriguez, Marcus Rohrbach

PDF

1 Repo

TL;DR

SIEVES introduces a visual evidence scoring method for selective prediction in multimodal large language models, significantly improving out-of-distribution coverage and enabling transfer to proprietary reasoners without internal confidence signals.

Contribution

It proposes a novel selector that estimates localization quality using only model inputs and outputs, enhancing generalization and transferability in visual question answering.

Findings

01

Coverage improved up to three times on OOD benchmarks.

02

Enables transfer to proprietary reasoners without access to internal signals.

03

Generalizes across multiple benchmarks and reasoner models.

Abstract

Multimodal large language models (MLLMs) achieve ever-stronger performance on visual-language tasks. Even as traditional visual question answering (VQA) benchmarks approach saturation, reliable deployment requires satisfying low error tolerances in real-world, out-of-distribution (OOD) scenarios. Precisely, selective prediction aims to improve coverage, i.e. the share of inputs the system answers, while adhering to a user-defined risk level. This is typically achieved by assigning a confidence score to each answer and abstaining on those that fall below a certain threshold. Existing selective prediction methods estimate implicit confidence scores, relying on model internal signals like logits or hidden representations, which are not available for frontier closed-sourced models. To enable reliable generalization in VQA, we require reasoner models to produce localized visual evidence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hector-gr/SIEVES
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.