Seg-R1: Segmentation Can Be Surprisingly Simple with Reinforcement Learning
Zuyao You, Zuxuan Wu

TL;DR
Seg-R1 demonstrates that reinforcement learning can significantly improve large multimodal models' pixel-level understanding and generalization in segmentation tasks without complex modifications or extensive supervision.
Contribution
This paper introduces Group Relative Policy Optimization (GRPO) for reinforcement learning in segmentation, achieving high performance with purely RL-based training on segmentation tasks.
Findings
Achieves 0.873 S-measure on COD10K without complex modifications.
Demonstrates strong zero-shot generalization to referring and reasoning segmentation.
Outperforms fully supervised models on certain open-world segmentation benchmarks.
Abstract
We present Seg-R1, a preliminary exploration of using reinforcement learning (RL) to enhance the pixel-level understanding and reasoning capabilities of large multimodal models (LMMs). Starting with foreground segmentation tasks, specifically camouflaged object detection (COD) and salient object detection (SOD), our approach enables the LMM to generate point and bounding box prompts in the next-token fashion, which are then used to guide SAM2 in producing segmentation masks. We introduce Group Relative Policy Optimization (GRPO) into the segmentation domain, equipping the LMM with pixel-level comprehension through a carefully designed training strategy. Notably, Seg-R1 achieves remarkable performance with purely RL-based training, achieving .873 S-measure on COD10K without complex model modification. Moreover, we found that pure RL training demonstrates strong open-world generalization.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
