Modality Prompts for Arbitrary Modality Salient Object Detection
Nianchang Huang, Yang Yang, Qiang Zhang, Jungong Han, Jin Huang

TL;DR
This paper introduces a modality-adaptive Transformer with prompt learning and dynamic fusion strategies to improve salient object detection across diverse and varying modalities such as RGB, RGB-D, and RGB-D-T.
Contribution
It proposes a novel modality-adaptive Transformer (MAT) with a modality prompt learning approach and a hybrid fusion strategy for arbitrary modality salient object detection.
Findings
Effective handling of diverse modality discrepancies.
Adaptive feature extraction using learned modality prompts.
Improved fusion of multimodal features for salient object detection.
Abstract
This paper delves into the task of arbitrary modality salient object detection (AM SOD), aiming to detect salient objects from arbitrary modalities, eg RGB images, RGB-D images, and RGB-D-T images. A novel modality-adaptive Transformer (MAT) will be proposed to investigate two fundamental challenges of AM SOD, ie more diverse modality discrepancies caused by varying modality types that need to be processed, and dynamic fusion design caused by an uncertain number of modalities present in the inputs of multimodal fusion strategy. Specifically, inspired by prompt learning's ability of aligning the distributions of pre-trained models to the characteristic of downstream tasks by learning some prompts, MAT will first present a modality-adaptive feature extractor (MAFE) to tackle the diverse modality discrepancies by introducing a modality prompt for each modality. In the training stage, a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
MethodsAttention Is All You Need · Dense Connections · Dropout · Label Smoothing · Residual Connection · Softmax · Attention Model · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings
