TL;DR
This paper investigates how Reinforcement Learning with Verifiable Rewards (RLVR) can extend the spatial reasoning capabilities of Vision-Language Models, demonstrating improved performance on synthetic and real-world navigation tasks.
Contribution
The study introduces Ariadne, a controlled maze navigation framework, showing RLVR enhances reasoning boundaries of VLMs beyond pre-training limitations.
Findings
RLVR extends spatial reasoning boundaries in VLMs.
Base policies fail on complex synthetic mazes, but RLVR-trained models succeed.
Out-of-domain performance improves, indicating genuine reasoning capability expansion.
Abstract
Recent studies posit that Reinforcement Learning with Verifiable Rewards (RLVR) primarily amplifies behaviors inherent to the pre-training distribution rather than inducing new capabilities, but these insights are predominantly limited to language-only domains, leaving the dynamics of visual-centric spatial reasoning under-explored. To examine the impact of RLVR on the capability boundaries of Vision-Language Models (VLMs), we introduce \textbf{Ariadne}, a controlled framework based on synthetic maze navigation where the reasoning difficulty is precisely regulated by path length and the number of turns. We demonstrate that applying RLVR extends the spatial reasoning boundary, achieving success on problems where the base policy VLM consistently attains accuracy despite increasing pass@k sampling budgets, indicating that the optimized policy successfully navigates search spaces that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
