A Holistically Point-guided Text Framework for Weakly-Supervised Camouflaged Object Detection
Tsui Qin Mok, Shuyong Gao, Haozhe Xing, Miaoyang He, Yan Wang,, Wenqiang Zhang

TL;DR
This paper presents a novel weakly-supervised camouflaged object detection framework using point-guided text prompts, achieving significant improvements over existing methods and introducing new datasets for the task.
Contribution
It introduces a holistically point-guided text framework with three phases, novel modules for mask correction and selection, and new datasets for weakly-supervised camouflaged object detection.
Findings
Outperforms state-of-the-art methods on four benchmarks.
Surpasses some fully-supervised camouflaged object detection methods.
Demonstrates effectiveness of point-guided text supervision.
Abstract
Weakly-Supervised Camouflaged Object Detection (WSCOD) has gained popularity for its promise to train models with weak labels to segment objects that visually blend into their surroundings. Recently, some methods using sparsely-annotated supervision shown promising results through scribbling in WSCOD, while point-text supervision remains underexplored. Hence, this paper introduces a novel holistically point-guided text framework for WSCOD by decomposing into three phases: segment, choose, train. Specifically, we propose Point-guided Candidate Generation (PCG), where the point's foreground serves as a correction for the text path to explicitly correct and rejuvenate the loss detection object during the mask generation process (SEGMENT). We also introduce a Qualified Candidate Discriminator (QCD) to choose the optimal mask from a given text prompt using CLIP (CHOOSE), and employ the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Image Enhancement Techniques
MethodsAbsolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Vision Transformer · Multi-Head Attention
