Generative Transformer for Accurate and Reliable Salient Object Detection
Yuxin Mao, Jing Zhang, Zhexiong Wan, Yuchao Dai, Aixuan Li, Yunqiu Lv,, Xinyu Tian, Deng-Ping Fan, and Nick Barnes

TL;DR
This paper explores the use of transformers for salient object detection, demonstrating improved accuracy and reliability by addressing over-confidence issues with a novel uncertainty estimation method called iGAN, which leverages MCMC for input-dependent latent variables.
Contribution
The paper introduces a transformer-based framework for salient object detection and proposes iGAN, a novel uncertainty estimation model using MCMC, to improve reliability of predictions.
Findings
Transformers outperform CNNs in salient object detection.
iGAN effectively estimates predictive uncertainty.
Transformer + iGAN achieves accurate and reliable detection.
Abstract
Transformer, which originates from machine translation, is particularly powerful at modeling long-range dependencies. Currently, the transformer is making revolutionary progress in various vision tasks, leading to significant performance improvements compared with the convolutional neural network (CNN) based frameworks. In this paper, we conduct extensive research on exploiting the contributions of transformers for accurate and reliable salient object detection. For the former, we apply transformer to a deterministic model, and explain that the effective structure modeling and global context modeling abilities lead to its superior performance compared with the CNN based frameworks. For the latter, we observe that both CNN and transformer based frameworks suffer greatly from the over-confidence issue, where the models tend to generate wrong predictions with high confidence. To estimate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image Fusion Techniques · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · Softmax · Dense Connections · Vision Transformer
