Modality-Agnostic Prompt Learning for Multi-Modal Camouflaged Object Detection

Hao Wang; Jiqing Zhang; Xin Yang; Baocai Yin; Lu Jiang; Zetian Mi; Huibing Wang

arXiv:2604.12380·cs.CV·April 15, 2026

Modality-Agnostic Prompt Learning for Multi-Modal Camouflaged Object Detection

Hao Wang, Jiqing Zhang, Xin Yang, Baocai Yin, Lu Jiang, Zetian Mi, Huibing Wang

PDF

TL;DR

This paper introduces a modality-agnostic prompt learning framework for camouflaged object detection that enhances cross-modal generalization and accuracy by leveraging a unified prompt approach with the Segment Anything Model.

Contribution

It proposes a novel, scalable prompt-based method that adapts to arbitrary auxiliary modalities for improved camouflaged object detection performance.

Findings

01

Significant performance gains on RGB-Depth, RGB-Thermal, and RGB-Polarization datasets.

02

Effective generalization across multiple modalities with a unified prompt framework.

03

Improved boundary accuracy with a lightweight Mask Refine Module.

Abstract

Camouflaged Object Detection (COD) aims to segment objects that blend seamlessly into complex backgrounds, with growing interest in exploiting additional visual modalities to enhance robustness through complementary information. However, most existing approaches generally rely on modality-specific architectures or customized fusion strategies, which limit scalability and cross-modal generalization. To address this, we propose a novel framework that generates modality-agnostic multi-modal prompts for the Segment Anything Model (SAM), enabling parameter-efficient adaptation to arbitrary auxiliary modalities and significantly improving overall performance on COD tasks. Specifically, we model multi-modal learning through interactions between a data-driven content domain and a knowledge-driven prompt domain, distilling task-relevant cues into unified prompts for SAM decoding. We further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.