TL;DR
This paper introduces a VLM-guided cascaded framework for open-vocabulary camouflaged object segmentation that improves localization and classification accuracy by leveraging rich semantics and explicit prompts, addressing domain gaps and boundary issues.
Contribution
The paper proposes a novel VLM-guided cascaded approach that integrates SAM with VLM-derived features for improved camouflaged object segmentation and classification.
Findings
Significantly better segmentation accuracy on OVCOS benchmarks.
Enhanced classification performance with full-image context.
Effective use of VLM semantics for both segmentation and classification.
Abstract
Open-Vocabulary Camouflaged Object Segmentation (OVCOS) seeks to segment and classify camouflaged objects from arbitrary categories, presenting unique challenges due to visual ambiguity and unseen categories.Recent approaches typically adopt a two-stage paradigm: first segmenting objects, then classifying the segmented regions using Vision Language Models (VLMs).However, these methods (1) suffer from a domain gap caused by the mismatch between VLMs' full-image training and cropped-region inference, and (2) depend on generic segmentation models optimized for well-delineated objects, making them less effective for camouflaged objects.Without explicit guidance, generic segmentation models often overlook subtle boundaries, leading to imprecise segmentation.In this paper,we introduce a novel VLM-guided cascaded framework to address these issues in OVCOS.For segmentation, we leverage the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSegment Anything Model · ADaptive gradient method with the OPTimal convergence rate
