Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization

Xingjian Diao; Zheyuan Liu; Chunhui Zhang; Weiyi Wu; Keyi Kong; Lin Shi; Kaize Ding; Soroush Vosoughi; Jiang Gui

arXiv:2601.04442·cs.CV·April 16, 2026

Addressing Overthinking in Large Vision-Language Models via Gated Perception-Reasoning Optimization

Xingjian Diao, Zheyuan Liu, Chunhui Zhang, Weiyi Wu, Keyi Kong, Lin Shi, Kaize Ding, Soroush Vosoughi, Jiang Gui

PDF

TL;DR

This paper introduces Gated Perception-Reasoning Optimization (GPRO), a meta-controller for large vision-language models that dynamically balances perception and reasoning paths to improve accuracy and efficiency.

Contribution

GPRO is a novel meta-reasoning framework that learns to route computation among perception, reasoning, and fast paths, addressing overthinking and perception failures in LVLMs.

Findings

01

GPRO outperforms recent slow-thinking methods in accuracy and efficiency.

02

It generates shorter responses while maintaining high performance.

03

Experiments on five benchmarks validate GPRO's effectiveness.

Abstract

Large Vision-Language Models (LVLMs) have exhibited strong reasoning capabilities through chain-of-thought mechanisms that generate step-by-step rationales. However, such slow-thinking approaches often lead to overthinking, where models produce excessively verbose responses even for simple queries, resulting in test-time inefficiency and even degraded accuracy. Prior work has attempted to mitigate this issue via adaptive reasoning strategies, but these methods largely overlook a fundamental bottleneck: visual perception failures. We argue that stable reasoning critically depends on low-level visual grounding, and that reasoning errors often originate from imperfect perception rather than insufficient deliberation. To address this limitation, we propose Gated Perception-Reasoning Optimization (GPRO), a meta-reasoning controller that dynamically routes computation among three decision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.