LENS: Learning to Segment Anything with Unified Reinforced Reasoning

Lianghui Zhu; Bin Ouyang; Yuxuan Zhang; Tianheng Cheng; Rui Hu; Haocheng Shen; Longjin Ran; Xiaoxin Chen; Li Yu; Wenyu Liu; Xinggang Wang

arXiv:2508.14153·cs.CV·November 19, 2025

LENS: Learning to Segment Anything with Unified Reinforced Reasoning

Lianghui Zhu, Bin Ouyang, Yuxuan Zhang, Tianheng Cheng, Rui Hu, Haocheng Shen, Longjin Ran, Xiaoxin Chen, Li Yu, Wenyu Liu, Xinggang Wang

PDF

Open Access 1 Video

TL;DR

LENS introduces a reinforcement learning framework that jointly optimizes reasoning and segmentation, significantly improving generalization and accuracy in text-prompted image segmentation tasks.

Contribution

It presents a novel end-to-end RL-based approach that incorporates chain-of-thought reasoning into segmentation, enhancing performance over existing fine-tuning methods.

Findings

01

Achieves 81.2% average cIoU on key benchmarks

02

Outperforms previous methods like GLaMM by up to 5.6%

03

Demonstrates the effectiveness of RL-driven reasoning in segmentation

Abstract

Text-prompted image segmentation enables fine-grained visual understanding and is critical for applications such as human-computer interaction and robotics. However, existing supervised fine-tuning methods typically ignore explicit chain-of-thought (CoT) reasoning at test time, which limits their ability to generalize to unseen prompts and domains. To address this issue, we introduce LENS, a scalable reinforcement-learning framework that jointly optimizes the reasoning process and segmentation in an end-to-end manner. We propose unified reinforcement-learning rewards that span sentence-, box-, and segment-level cues, encouraging the model to generate informative CoT rationales while refining mask quality. Using a publicly available 3-billion-parameter vision-language model, i.e., Qwen2.5-VL-3B-Instruct, LENS achieves an average cIoU of 81.2% on the RefCOCO, RefCOCO+, and RefCOCOg…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LENS: Learning to Segment Anything with Unified Reinforced Reasoning· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Semantic Web and Ontologies · AI-based Problem Solving and Planning