Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models

Minghe Shen; Zhuo Zhi; Chonghan Liu; Shuo Xing; Zhengzhong Tu; Che Liu

arXiv:2511.00710·cs.AI·April 15, 2026

Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models

Minghe Shen, Zhuo Zhi, Chonghan Liu, Shuo Xing, Zhengzhong Tu, Che Liu

PDF

1 Models

TL;DR

This paper investigates how Reinforcement Learning with Verifiable Rewards (RLVR) can extend the spatial reasoning capabilities of Vision-Language Models, demonstrating improved performance on synthetic and real-world navigation tasks.

Contribution

The study introduces Ariadne, a controlled maze navigation framework, showing RLVR enhances reasoning boundaries of VLMs beyond pre-training limitations.

Findings

01

RLVR extends spatial reasoning boundaries in VLMs.

02

Base policies fail on complex synthetic mazes, but RLVR-trained models succeed.

03

Out-of-domain performance improves, indicating genuine reasoning capability expansion.

Abstract

Recent studies posit that Reinforcement Learning with Verifiable Rewards (RLVR) primarily amplifies behaviors inherent to the pre-training distribution rather than inducing new capabilities, but these insights are predominantly limited to language-only domains, leaving the dynamics of visual-centric spatial reasoning under-explored. To examine the impact of RLVR on the capability boundaries of Vision-Language Models (VLMs), we introduce \textbf{Ariadne}, a controlled framework based on synthetic maze navigation where the reasoning difficulty is precisely regulated by path length and the number of turns. We demonstrate that applying RLVR extends the spatial reasoning boundary, achieving success on problems where the base policy VLM consistently attains $0%$ accuracy despite increasing pass@k sampling budgets, indicating that the optimized policy successfully navigates search spaces that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
KOKKKOKK/Ariadne
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.