HRTransNet: HRFormer-Driven Two-Modality Salient Object Detection
Bin Tang, Zhengyi Liu, Yacheng Tan, and Qian He

TL;DR
HRTransNet introduces a novel two-modality salient object detection framework that effectively fuses primary and supplementary modalities using attention mechanisms and dual-direction fusion, significantly improving detection accuracy in multi-modal scenarios.
Contribution
The paper proposes a new HRTransNet model that integrates an auxiliary modality with primary input using attention-based fusion and intra/inter-feature transformers for enhanced two-modality SOD.
Findings
Achieves significant improvements in RGB-D, RGB-T, and light field SOD tasks.
Effectively fuses modalities at input and output levels for detailed object representation.
Demonstrates superior performance over existing methods.
Abstract
The High-Resolution Transformer (HRFormer) can maintain high-resolution representation and share global receptive fields. It is friendly towards salient object detection (SOD) in which the input and output have the same resolution. However, two critical problems need to be solved for two-modality SOD. One problem is two-modality fusion. The other problem is the HRFormer output's fusion. To address the first problem, a supplementary modality is injected into the primary modality by using global optimization and an attention mechanism to select and purify the modality at the input level. To solve the second problem, a dual-direction short connection fusion module is used to optimize the output features of HRFormer, thereby enhancing the detailed representation of objects at the output level. The proposed model, named HRTransNet, first introduces an auxiliary stream for feature extraction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image Fusion Techniques · Infrared Target Detection Methodologies
MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Linear Layer · Dropout · Softmax · Residual Connection · Label Smoothing
