SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient   object detection

Zhengyi Liu; Yacheng Tan; Qian He; Yun Xiao

arXiv:2204.05585·cs.CV·April 13, 2022

SwinNet: Swin Transformer drives edge-aware RGB-D and RGB-T salient object detection

Zhengyi Liu, Yacheng Tan, Qian He, Yun Xiao

PDF

1 Repo

TL;DR

SwinNet leverages Swin Transformer and edge-guided fusion to enhance salient object detection in RGB-D and RGB-T images, outperforming existing models by effectively capturing hierarchical features and boundary details.

Contribution

The paper introduces a novel cross-modality fusion model using Swin Transformer and edge guidance for improved RGB-D and RGB-T salient object detection.

Findings

01

Outperforms state-of-the-art models on multiple datasets.

02

Effectively captures hierarchical and boundary features.

03

Enhances cross-modality feature fusion.

Abstract

Convolutional neural networks (CNNs) are good at extracting contexture features within certain receptive fields, while transformers can model the global long-range dependency features. By absorbing the advantage of transformer and the merit of CNN, Swin Transformer shows strong feature representation ability. Based on it, we propose a cross-modality fusion model SwinNet for RGB-D and RGB-T salient object detection. It is driven by Swin Transformer to extract the hierarchical features, boosted by attention mechanism to bridge the gap between two modalities, and guided by edge information to sharp the contour of salient object. To be specific, two-stream Swin Transformer encoder first extracts multi-modality features, and then spatial alignment and channel re-calibration module is presented to optimize intra-level cross-modality features. To clarify the fuzzy boundary, edge-guided decoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liuzywen/swinnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection · Stochastic Depth · Softmax · Label Smoothing