Learning Adaptive Reasoning Paths for Efficient Visual Reasoning

Yixu Huang; Tinghui Zhu; Muhao Chen

arXiv:2604.14568·cs.CV·April 17, 2026

Learning Adaptive Reasoning Paths for Efficient Visual Reasoning

Yixu Huang, Tinghui Zhu, Muhao Chen

PDF

1 Repo

TL;DR

This paper introduces AVR, an adaptive visual reasoning framework that reduces unnecessary reasoning steps in VRMs by dynamically selecting response formats, significantly decreasing token usage while maintaining accuracy.

Contribution

It proposes a novel adaptive reasoning approach with a new training method, improving efficiency in visual reasoning models compared to prior static methods.

Findings

01

AVR reduces token usage by 50-90% on benchmarks.

02

AVR maintains accuracy while decreasing reasoning complexity.

03

Adaptive reasoning mitigates overthinking in VRMs.

Abstract

Visual reasoning models (VRMs) have recently shown strong cross-modal reasoning capabilities by integrating visual perception with language reasoning. However, they often suffer from overthinking, producing unnecessarily long reasoning chains for any tasks. We attribute this issue to \textbf{Reasoning Path Redundancy} in visual reasoning: many visual questions do not require the full reasoning process. To address this, we propose \textbf{AVR}, an adaptive visual reasoning framework that decomposes visual reasoning into three cognitive functions: visual perception, logical reasoning, and answer application. It further enables models to dynamically choose among three response formats: Full Format, Perception-Only Format, and Direct Answer. AVR is trained with FS-GRPO, an adaptation of Group Relative Policy Optimization that encourages the model to select the most efficient reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RunRiotComeOn/AVR
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.