Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles

Zifu Wang; Junyi Zhu; Bo Tang; Zhiyu Li; Feiyu Xiong; Jiaqian Yu; Matthew B. Blaschko

arXiv:2505.23590·cs.CV·October 14, 2025

Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles

Zifu Wang, Junyi Zhu, Bo Tang, Zhiyu Li, Feiyu Xiong, Jiaqian Yu, Matthew B. Blaschko

PDF

Open Access 1 Repo

TL;DR

This study investigates rule-based visual reinforcement learning using jigsaw puzzles as a structured framework, revealing how multimodal models learn, generalize, and reason in complex visual tasks, with implications for multimodal AI development.

Contribution

It provides the first comprehensive analysis of rule-based visual RL with jigsaw puzzles, highlighting key insights into model learning, reasoning, and generalization behaviors.

Findings

01

MLLMs improve from near random to high accuracy with fine-tuning.

02

Training on jigsaw puzzles aids generalization to other visual tasks.

03

RL outperforms supervised fine-tuning in generalization.

Abstract

The application of rule-based reinforcement learning (RL) to multimodal large language models (MLLMs) introduces unique challenges and potential deviations from findings in text-only domains, particularly for perception-heavy tasks. This paper provides a comprehensive study of rule-based visual RL, using jigsaw puzzles as a structured experimental framework. Jigsaw puzzles offer inherent ground truth, adjustable difficulty, and demand complex decision-making, making them ideal for this study. Our research reveals several key findings: \textit{Firstly,} we find that MLLMs, initially performing near to random guessing on the simplest jigsaw puzzles, achieve near-perfect accuracy and generalize to complex, unseen configurations through fine-tuning. \textit{Secondly,} training on jigsaw puzzles can induce generalization to other visual tasks, with effectiveness tied to specific task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zifuwanggg/jigsaw-r1
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Handwritten Text Recognition Techniques · Human Pose and Action Recognition

MethodsShrink and Fine-Tune · Jigsaw