Referring Camouflaged Object Detection With Multi-Context Overlapped Windows Cross-Attention
Yu Wen, Shuyong Gao, Shuping Zhang, Miao Huang, Lili Tao, Han Yang, Haozhe Xing, Lihe Zhang, Boxue Hou

TL;DR
This paper introduces RFMNet, a novel model that enhances referring camouflaged object detection by fusing multi-stage salient image features with camouflaged features using local-overlapped window cross-attention, achieving state-of-the-art results.
Contribution
The paper proposes RFMNet with multi-stage feature fusion, local-overlapped window cross-attention, and a referring feature aggregation module for improved camouflaged object detection.
Findings
Achieves state-of-the-art performance on Ref-COD benchmark.
Demonstrates the effectiveness of local-overlapped window cross-attention.
Shows significant improvement over previous methods.
Abstract
Referring camouflaged object detection (Ref-COD) aims to identify hidden objects by incorporating reference information such as images and text descriptions. Previous research has transformed reference images with salient objects into one-dimensional prompts, yielding significant results. We explore ways to enhance performance through multi-context fusion of rich salient image features and camouflaged object features. Therefore, we propose RFMNet, which utilizes features from multiple encoding stages of the reference salient images and performs interactive fusion with the camouflage features at the corresponding encoding stages. Given that the features in salient object images contain abundant object-related detail information, performing feature fusion within local areas is more beneficial for detecting camouflaged objects. Therefore, we propose an Overlapped Windows Cross-attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Face Recognition and Perception
