Referring Camouflaged Object Detection With Multi-Context Overlapped Windows Cross-Attention

Yu Wen; Shuyong Gao; Shuping Zhang; Miao Huang; Lili Tao; Han Yang; Haozhe Xing; Lihe Zhang; Boxue Hou

arXiv:2511.13249·cs.CV·November 18, 2025

Referring Camouflaged Object Detection With Multi-Context Overlapped Windows Cross-Attention

Yu Wen, Shuyong Gao, Shuping Zhang, Miao Huang, Lili Tao, Han Yang, Haozhe Xing, Lihe Zhang, Boxue Hou

PDF

Open Access

TL;DR

This paper introduces RFMNet, a novel model that enhances referring camouflaged object detection by fusing multi-stage salient image features with camouflaged features using local-overlapped window cross-attention, achieving state-of-the-art results.

Contribution

The paper proposes RFMNet with multi-stage feature fusion, local-overlapped window cross-attention, and a referring feature aggregation module for improved camouflaged object detection.

Findings

01

Achieves state-of-the-art performance on Ref-COD benchmark.

02

Demonstrates the effectiveness of local-overlapped window cross-attention.

03

Shows significant improvement over previous methods.

Abstract

Referring camouflaged object detection (Ref-COD) aims to identify hidden objects by incorporating reference information such as images and text descriptions. Previous research has transformed reference images with salient objects into one-dimensional prompts, yielding significant results. We explore ways to enhance performance through multi-context fusion of rich salient image features and camouflaged object features. Therefore, we propose RFMNet, which utilizes features from multiple encoding stages of the reference salient images and performs interactive fusion with the camouflage features at the corresponding encoding stages. Given that the features in salient object images contain abundant object-related detail information, performing feature fusion within local areas is more beneficial for detecting camouflaged objects. Therefore, we propose an Overlapped Windows Cross-attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Face Recognition and Perception