A Holistically Point-guided Text Framework for Weakly-Supervised   Camouflaged Object Detection

Tsui Qin Mok; Shuyong Gao; Haozhe Xing; Miaoyang He; Yan Wang,; Wenqiang Zhang

arXiv:2501.06038·cs.CV·January 13, 2025

A Holistically Point-guided Text Framework for Weakly-Supervised Camouflaged Object Detection

Tsui Qin Mok, Shuyong Gao, Haozhe Xing, Miaoyang He, Yan Wang,, Wenqiang Zhang

PDF

Open Access

TL;DR

This paper presents a novel weakly-supervised camouflaged object detection framework using point-guided text prompts, achieving significant improvements over existing methods and introducing new datasets for the task.

Contribution

It introduces a holistically point-guided text framework with three phases, novel modules for mask correction and selection, and new datasets for weakly-supervised camouflaged object detection.

Findings

01

Outperforms state-of-the-art methods on four benchmarks.

02

Surpasses some fully-supervised camouflaged object detection methods.

03

Demonstrates effectiveness of point-guided text supervision.

Abstract

Weakly-Supervised Camouflaged Object Detection (WSCOD) has gained popularity for its promise to train models with weak labels to segment objects that visually blend into their surroundings. Recently, some methods using sparsely-annotated supervision shown promising results through scribbling in WSCOD, while point-text supervision remains underexplored. Hence, this paper introduces a novel holistically point-guided text framework for WSCOD by decomposing into three phases: segment, choose, train. Specifically, we propose Point-guided Candidate Generation (PCG), where the point's foreground serves as a correction for the text path to explicitly correct and rejuvenate the loss detection object during the mask generation process (SEGMENT). We also introduce a Qualified Candidate Discriminator (QCD) to choose the optimal mask from a given text prompt using CLIP (CHOOSE), and employ the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Image Enhancement Techniques

MethodsAbsolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Vision Transformer · Multi-Head Attention