Open-Vocabulary Camouflaged Object Segmentation with Cascaded Vision Language Models

Kai Zhao; Wubang Yuan; Zheng Wang; Guanyi Li; Xiaoqiang Zhu; Deng-ping Fan; Dan Zeng

arXiv:2506.19300·cs.CV·March 10, 2026

Open-Vocabulary Camouflaged Object Segmentation with Cascaded Vision Language Models

Kai Zhao, Wubang Yuan, Zheng Wang, Guanyi Li, Xiaoqiang Zhu, Deng-ping Fan, Dan Zeng

PDF

1 Repo

TL;DR

This paper introduces a VLM-guided cascaded framework for open-vocabulary camouflaged object segmentation that improves localization and classification accuracy by leveraging rich semantics and explicit prompts, addressing domain gaps and boundary issues.

Contribution

The paper proposes a novel VLM-guided cascaded approach that integrates SAM with VLM-derived features for improved camouflaged object segmentation and classification.

Findings

01

Significantly better segmentation accuracy on OVCOS benchmarks.

02

Enhanced classification performance with full-image context.

03

Effective use of VLM semantics for both segmentation and classification.

Abstract

Open-Vocabulary Camouflaged Object Segmentation (OVCOS) seeks to segment and classify camouflaged objects from arbitrary categories, presenting unique challenges due to visual ambiguity and unseen categories.Recent approaches typically adopt a two-stage paradigm: first segmenting objects, then classifying the segmented regions using Vision Language Models (VLMs).However, these methods (1) suffer from a domain gap caused by the mismatch between VLMs' full-image training and cropped-region inference, and (2) depend on generic segmentation models optimized for well-delineated objects, making them less effective for camouflaged objects.Without explicit guidance, generic segmentation models often overlook subtle boundaries, leading to imprecise segmentation.In this paper,we introduce a novel VLM-guided cascaded framework to address these issues in OVCOS.For segmentation, we leverage the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

intcomp/camouflaged-vlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSegment Anything Model · ADaptive gradient method with the OPTimal convergence rate