PuzzleCraft: Exploration-Aware Curriculum Learning for Puzzle-Based RLVR in VLMs
Ahmadreza Jeddi, Hakki Can Karaimer, Hue Nguyen, Zhongling Wang, Ke Zhao, Javad Rajabi, Ran Zhang, Raghav Goyal, Konstantinos G. Derpanis, Babak Taati, Radek Grzeszczuk

TL;DR
PuzzleCraft introduces an exploration-aware curriculum for puzzle-based RLVR in vision-language models, improving reasoning consistency and robustness without supervision by leveraging puzzle environments and a new metric.
Contribution
It presents PuzzleCraft, a supervision-free framework that scales puzzle-based RLVR with a novel curriculum considering difficulty and exploration, and introduces RAC for measuring reasoning-answer consistency.
Findings
Improves RAC and downstream performance on vision benchmarks.
Enhances robustness and reasoning consistency in VLMs.
Achieves consistent gains on Qwen2.5-VL and Qwen3-VL models.
Abstract
RL post-training with verifiable rewards (RLVR) has become a practical route to eliciting chain-of-thought reasoning in vision--language models (VLMs), but scaling it in the visual domain remains challenging due to costly or noisy supervision and reliance on external verifiers. Puzzle-based RLVR is a promising alternative, yet existing approaches often treat puzzle rewards as flat or sparse, which weakens group-relative learning signal. Existing curriculum strategies are overly restrictive: they rely mainly on reward statistics and do not account for exploration in the solution space, which can lead to collapsed rollout dynamics. Further, RL post-training can induce reasoning--answer inconsistency as training progresses. To address these shortcomings, we present PuzzleCraft, a supervision-free framework that scales vision-centric RLVR using a set of lightweight puzzle environments with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Ethics and Social Impacts of AI
