ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation

Jianwen Tan; Huiyao Zhang; Rui Xiong; Han Zhou; Hongfei Wang; Ye Li

arXiv:2508.18050·cs.CV·August 26, 2025

ArgusCogito: Chain-of-Thought for Cross-Modal Synergy and Omnidirectional Reasoning in Camouflaged Object Segmentation

Jianwen Tan, Huiyao Zhang, Rui Xiong, Han Zhou, Hongfei Wang, Ye Li

PDF

TL;DR

ArgusCogito introduces a novel chain-of-thought framework leveraging cross-modal synergy and omnidirectional reasoning in vision-language models to significantly improve camouflaged object segmentation accuracy and robustness.

Contribution

It presents a zero-shot, cognitively-inspired three-stage reasoning framework that enhances holistic understanding and precise segmentation in challenging COS tasks.

Findings

01

Achieves state-of-the-art results on four COS benchmarks.

02

Demonstrates superior generalization and robustness across diverse datasets.

03

Validates effectiveness in medical image segmentation tasks.

Abstract

Camouflaged Object Segmentation (COS) poses a significant challenge due to the intrinsic high similarity between targets and backgrounds, demanding models capable of profound holistic understanding beyond superficial cues. Prevailing methods, often limited by shallow feature representation, inadequate reasoning mechanisms, and weak cross-modal integration, struggle to achieve this depth of cognition, resulting in prevalent issues like incomplete target separation and imprecise segmentation. Inspired by the perceptual strategy of the Hundred-eyed Giant-emphasizing holistic observation, omnidirectional focus, and intensive scrutiny-we introduce ArgusCogito, a novel zero-shot, chain-of-thought framework underpinned by cross-modal synergy and omnidirectional reasoning within Vision-Language Models (VLMs). ArgusCogito orchestrates three cognitively-inspired stages: (1) Conjecture: Constructs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.