VISCO: Benchmarking Fine-Grained Critique and Correction Towards   Self-Improvement in Visual Reasoning

Xueqing Wu; Yuheng Ding; Bingxuan Li; Pan Lu; Da Yin; Kai-Wei Chang,; Nanyun Peng

arXiv:2412.02172·cs.CV·March 19, 2025

VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning

Xueqing Wu, Yuheng Ding, Bingxuan Li, Pan Lu, Da Yin, Kai-Wei Chang,, Nanyun Peng

PDF

Open Access 1 Datasets

TL;DR

VISCO introduces a comprehensive benchmark for evaluating fine-grained critique and correction in vision-language models, revealing current limitations and proposing a strategy that enhances self-improvement in visual reasoning tasks.

Contribution

This work presents the first benchmark for dense critique and correction in LVLMs, analyzes their capabilities, identifies common failure patterns, and proposes the LookBack strategy to improve performance.

Findings

01

Human critiques significantly improve correction performance.

02

Model-generated critiques are less effective and sometimes harmful.

03

LookBack strategy enhances critique and correction by up to 13.5%.

Abstract

The ability of large vision-language models (LVLMs) to critique and correct their reasoning is an essential building block towards their self-improvement. However, a systematic analysis of such capabilities in LVLMs is still lacking. We propose VISCO, the first benchmark to extensively analyze the fine-grained critique and correction capabilities of LVLMs. Compared to existing work that uses a single scalar value to critique the entire reasoning [4], VISCO features dense and fine-grained critique, requiring LVLMs to evaluate the correctness of each step in the chain-of-thought and provide natural language explanations to support their judgments. Extensive evaluation of 24 LVLMs demonstrates that human-written critiques significantly enhance the performance after correction, showcasing the potential of the self-improvement strategy. However, the model-generated critiques are less helpful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

uclanlp/VISCO
dataset· 112 dl
112 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics