Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning

Zuyao You; Zuxuan Wu

arXiv:2506.22624·cs.CV·July 1, 2025

Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning

Zuyao You, Zuxuan Wu

PDF

2 Repos 1 Datasets

TL;DR

Seg-R1 demonstrates that reinforcement learning can significantly improve large multimodal models' pixel-level understanding and generalization in segmentation tasks without complex modifications or extensive supervision.

Contribution

This paper introduces Group Relative Policy Optimization (GRPO) for reinforcement learning in segmentation, achieving high performance with purely RL-based training on segmentation tasks.

Findings

01

Achieves 0.873 S-measure on COD10K without complex modifications.

02

Demonstrates strong zero-shot generalization to referring and reasoning segmentation.

03

Outperforms fully supervised models on certain open-world segmentation benchmarks.

Abstract

We present Seg-R1, a preliminary exploration of using reinforcement learning (RL) to enhance the pixel-level understanding and reasoning capabilities of large multimodal models (LMMs). Starting with foreground segmentation tasks, specifically camouflaged object detection (COD) and salient object detection (SOD), our approach enables the LMM to generate point and bounding box prompts in the next-token fashion, which are then used to guide SAM2 in producing segmentation masks. We introduce Group Relative Policy Optimization (GRPO) into the segmentation domain, equipping the LMM with pixel-level comprehension through a carefully designed training strategy. Notably, Seg-R1 achieves remarkable performance with purely RL-based training, achieving .873 S-measure on COD10K without complex model modification. Moreover, we found that pure RL training demonstrates strong open-world generalization.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

geshang/FCoT
dataset· 22 dl
22 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.