SSL-R1: Self-Supervised Visual Reinforcement Post-Training for Multimodal Large Language Models

Jiahao Xie; Alessio Tonioni; Nathalie Rauschmayr; Federico Tombari; Bernt Schiele

arXiv:2604.20705·cs.CV·April 23, 2026

SSL-R1: Self-Supervised Visual Reinforcement Post-Training for Multimodal Large Language Models

Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele

PDF

1 Repo

TL;DR

SSL-R1 introduces a self-supervised reinforcement learning framework that generates verifiable visual rewards from images, enhancing multimodal large language models' reasoning without human annotations.

Contribution

It reformulates self-supervised visual tasks into verifiable rewards for RL post-training, improving MLLMs' visual understanding and reasoning capabilities.

Findings

01

Training on visual puzzles improves multimodal reasoning benchmarks.

02

The framework eliminates the need for human or external supervision.

03

Enhances the scalability of RL for multimodal models.

Abstract

Reinforcement learning (RL) with verifiable rewards (RLVR) has demonstrated the great potential of enhancing the reasoning abilities in multimodal large language models (MLLMs). However, the reliance on language-centric priors and expensive manual annotations prevents MLLMs' intrinsic visual understanding and scalable reward designs. In this work, we introduce SSL-R1, a generic self-supervised RL framework that derives verifiable rewards directly from images. To this end, we revisit self-supervised learning (SSL) in visual domains and reformulate widely-used SSL tasks into a set of verifiable visual puzzles for RL post-training, requiring neither human nor external model supervision. Training MLLMs on these tasks substantially improves their performance on multimodal understanding and reasoning benchmarks, highlighting the potential of leveraging vision-centric self-supervised tasks for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jiahao000/SSL-R1
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.