SelfReformer: Self-Refined Network with Transformer for Salient Object   Detection

Yi Ke Yun; Weisi Lin

arXiv:2205.11283·cs.CV·July 19, 2022·36 cites

SelfReformer: Self-Refined Network with Transformer for Salient Object Detection

Yi Ke Yun, Weisi Lin

PDF

Open Access 1 Repo

TL;DR

SelfReformer is a Transformer-based network that improves salient object detection by explicitly learning global context and refining local details, achieving state-of-the-art results on multiple benchmarks.

Contribution

The paper introduces a novel SelfReformer network that combines Transformer-based global context learning with a Pixel Shuffle-based upsampling and a two-stage refinement module for superior SOD performance.

Findings

01

Achieves state-of-the-art performance on five benchmark datasets.

02

Effectively captures long-range dependencies with Transformer architecture.

03

Refines local details through a two-stage context refinement module.

Abstract

The global and local contexts significantly contribute to the integrity of predictions in Salient Object Detection (SOD). Unfortunately, existing methods still struggle to generate complete predictions with fine details. There are two major problems in conventional approaches: first, for global context, high-level CNN-based encoder features cannot effectively catch long-range dependencies, resulting in incomplete predictions. Second, downsampling the ground truth to fit the size of predictions will introduce inaccuracy as the ground truth details are lost during interpolation or pooling. Thus, in this work, we developed a Transformer-based network and framed a supervised task for a branch to learn the global context information explicitly. Besides, we adopt Pixel Shuffle from Super-Resolution (SR) to reshape the predictions back to the size of ground truth instead of the reverse. Thus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BarCodeReader/SelfReformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image Fusion Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Dropout · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Residual Connection