HRTransNet: HRFormer-Driven Two-Modality Salient Object Detection

Bin Tang; Zhengyi Liu; Yacheng Tan; and Qian He

arXiv:2301.03036·cs.CV·January 25, 2023

HRTransNet: HRFormer-Driven Two-Modality Salient Object Detection

Bin Tang, Zhengyi Liu, Yacheng Tan, and Qian He

PDF

Open Access 1 Repo

TL;DR

HRTransNet introduces a novel two-modality salient object detection framework that effectively fuses primary and supplementary modalities using attention mechanisms and dual-direction fusion, significantly improving detection accuracy in multi-modal scenarios.

Contribution

The paper proposes a new HRTransNet model that integrates an auxiliary modality with primary input using attention-based fusion and intra/inter-feature transformers for enhanced two-modality SOD.

Findings

01

Achieves significant improvements in RGB-D, RGB-T, and light field SOD tasks.

02

Effectively fuses modalities at input and output levels for detailed object representation.

03

Demonstrates superior performance over existing methods.

Abstract

The High-Resolution Transformer (HRFormer) can maintain high-resolution representation and share global receptive fields. It is friendly towards salient object detection (SOD) in which the input and output have the same resolution. However, two critical problems need to be solved for two-modality SOD. One problem is two-modality fusion. The other problem is the HRFormer output's fusion. To address the first problem, a supplementary modality is injected into the primary modality by using global optimization and an attention mechanism to select and purify the modality at the input level. To solve the second problem, a dual-direction short connection fusion module is used to optimize the output features of HRFormer, thereby enhancing the detailed representation of objects at the output level. The proposed model, named HRTransNet, first introduces an auxiliary stream for feature extraction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liuzywen/hrtransnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image Fusion Techniques · Infrared Target Detection Methodologies

MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Linear Layer · Dropout · Softmax · Residual Connection · Label Smoothing