Depth Awakens: A Depth-perceptual Attention Fusion Network for RGB-D   Camouflaged Object Detection

Xinran Liua; Lin Qia; Yuxuan Songa; Qi Wen

arXiv:2405.05614·cs.CV·May 12, 2024

Depth Awakens: A Depth-perceptual Attention Fusion Network for RGB-D Camouflaged Object Detection

Xinran Liua, Lin Qia, Yuxuan Songa, Qi Wen

PDF

TL;DR

This paper introduces a novel RGB-D network that leverages depth information through a depth-perceptual attention fusion mechanism to improve camouflaged object detection, outperforming existing methods.

Contribution

The paper proposes a depth-perception attention fusion network with a trident-branch encoder and depth-weighted cross-attention for enhanced camouflaged object detection.

Findings

01

Significant performance improvement over state-of-the-art methods

02

Effective utilization of depth maps as auxiliary input

03

Demonstrated the importance of 3D spatial cues in camouflage detection

Abstract

Camouflaged object detection (COD) presents a persistent challenge in accurately identifying objects that seamlessly blend into their surroundings. However, most existing COD models overlook the fact that visual systems operate within a genuine 3D environment. The scene depth inherent in a single 2D image provides rich spatial clues that can assist in the detection of camouflaged objects. Therefore, we propose a novel depth-perception attention fusion network that leverages the depth map as an auxiliary input to enhance the network's ability to perceive 3D information, which is typically challenging for the human eye to discern from 2D images. The network uses a trident-branch encoder to extract chromatic and depth information and their communications. Recognizing that certain regions of a depth map may not effectively highlight the camouflaged object, we introduce a depth-weighted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.