Reasoning Segmentation for Images and Videos: A Survey

Yiqing Shen; Chenjia Li; Fei Xiong; Jeong-O Jeong; Tianpeng Wang; Michael Latman; Mathias Unberath

arXiv:2505.18816·cs.CV·May 27, 2025

Reasoning Segmentation for Images and Videos: A Survey

Yiqing Shen, Chenjia Li, Fei Xiong, Jeong-O Jeong, Tianpeng Wang, Michael Latman, Mathias Unberath

PDF

TL;DR

This survey reviews the emerging field of Reasoning Segmentation, which combines visual perception with natural language reasoning to improve image and video object delineation, highlighting methods, datasets, and future challenges.

Contribution

It provides the first comprehensive overview of Reasoning Segmentation techniques, datasets, evaluation metrics, and applications, offering insights into current research gaps and future directions.

Findings

01

Reviewed 26 state-of-the-art RS methods

02

Analyzed 29 datasets and benchmarks

03

Identified key research gaps and future opportunities

Abstract

Reasoning Segmentation (RS) aims to delineate objects based on implicit text queries, the interpretation of which requires reasoning and knowledge integration. Unlike the traditional formulation of segmentation problems that relies on fixed semantic categories or explicit prompting, RS bridges the gap between visual perception and human-like reasoning capabilities, facilitating more intuitive human-AI interaction through natural language. Our work presents the first comprehensive survey of RS for image and video processing, examining 26 state-of-the-art methods together with a review of the corresponding evaluation metrics, as well as 29 datasets and benchmarks. We also explore existing applications of RS across diverse domains and identify their potential extensions. Finally, we identify current research gaps and highlight promising future directions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.