SelfReformer: Self-Refined Network with Transformer for Salient Object Detection
Yi Ke Yun, Weisi Lin

TL;DR
SelfReformer is a Transformer-based network that improves salient object detection by explicitly learning global context and refining local details, achieving state-of-the-art results on multiple benchmarks.
Contribution
The paper introduces a novel SelfReformer network that combines Transformer-based global context learning with a Pixel Shuffle-based upsampling and a two-stage refinement module for superior SOD performance.
Findings
Achieves state-of-the-art performance on five benchmark datasets.
Effectively captures long-range dependencies with Transformer architecture.
Refines local details through a two-stage context refinement module.
Abstract
The global and local contexts significantly contribute to the integrity of predictions in Salient Object Detection (SOD). Unfortunately, existing methods still struggle to generate complete predictions with fine details. There are two major problems in conventional approaches: first, for global context, high-level CNN-based encoder features cannot effectively catch long-range dependencies, resulting in incomplete predictions. Second, downsampling the ground truth to fit the size of predictions will introduce inaccuracy as the ground truth details are lost during interpolation or pooling. Thus, in this work, we developed a Transformer-based network and framed a supervised task for a branch to learn the global context information explicitly. Besides, we adopt Pixel Shuffle from Super-Resolution (SR) to reshape the predictions back to the size of ground truth instead of the reverse. Thus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image Fusion Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Dropout · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Residual Connection
