ZS-VCOS: Zero-Shot Video Camouflaged Object Segmentation By Optical Flow and Open Vocabulary Object Detection

Wenqi Guo; Mohamed Shehata; Shan Du

arXiv:2505.01431·cs.CV·July 22, 2025

ZS-VCOS: Zero-Shot Video Camouflaged Object Segmentation By Optical Flow and Open Vocabulary Object Detection

Wenqi Guo, Mohamed Shehata, Shan Du

PDF

Open Access 1 Repo

TL;DR

This paper introduces a zero-shot video camouflaged object segmentation method that leverages large pre-trained models and optical flow, significantly outperforming existing methods without training on task-specific data.

Contribution

It presents a novel zero-shot segmentation pipeline integrating SAM-2, Owl-v2, and temporal cues, achieving state-of-the-art results on multiple datasets.

Findings

01

Outperforms existing zero-shot methods with F-measure of 0.628

02

Surpasses supervised methods with F-measure of 0.628

03

Increases success rate on MoCA-Filter dataset from 0.628 to 0.697

Abstract

Camouflaged object segmentation presents unique challenges compared to traditional segmentation tasks, primarily due to the high similarity in patterns and colors between camouflaged objects and their backgrounds. Effective solutions to this problem have significant implications in critical areas such as pest control, defect detection, and lesion segmentation in medical imaging. Prior research has predominantly emphasized supervised or unsupervised pre-training methods, leaving zero-shot approaches significantly underdeveloped. Existing zero-shot techniques commonly utilize the Segment Anything Model (SAM) in automatic mode or rely on vision-language models to generate cues for segmentation; however, their performances remain unsatisfactory, due to the similarity of the camouflaged object and the background. This work studies how to avoid training by integrating large pre-trained models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weathon/vcos
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Social Robot Interaction and HRI

MethodsSegment Anything Model