CoT-Seg: Rethinking Segmentation with Chain-of-Thought Reasoning and Self-Correction
Shiu-hong Kao, Chak Ho Huang, Huaiqian Liu, Yu-Wing Tai, Chi-Keung Tang

TL;DR
CoT-Seg introduces a reasoning-based, training-free segmentation framework that uses chain-of-thought reasoning and self-correction, significantly improving robustness in complex and ambiguous cases by leveraging pre-trained multi-modal large language models.
Contribution
The paper presents CoT-Seg, a novel framework combining chain-of-thought reasoning and self-correction for segmentation without fine-tuning, enhancing performance on challenging cases.
Findings
Outperforms existing methods on complex segmentation tasks
Effectively handles ambiguous and out-of-domain images
Introduces the ReasonSeg-Hard dataset for challenging cases
Abstract
Existing works of reasoning segmentation often fall short in complex cases, particularly when addressing complicated queries and out-of-domain images. Inspired by the chain-of-thought reasoning, where harder problems require longer thinking steps/time, this paper aims to explore a system that can think step-by-step, look up information if needed, generate results, self-evaluate its own results, and refine the results, in the same way humans approach harder questions. We introduce CoT-Seg, a training-free framework that rethinks reasoning segmentation by combining chain-of-thought reasoning with self-correction. Instead of fine-tuning, CoT-Seg leverages the inherent reasoning ability of pre-trained MLLMs (GPT-4o) to decompose queries into meta-instructions, extract fine-grained semantics from images, and identify target objects even under implicit or complex prompts. Moreover, CoT-Seg…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Graph Neural Networks
