Generative Transformer for Accurate and Reliable Salient Object   Detection

Yuxin Mao; Jing Zhang; Zhexiong Wan; Yuchao Dai; Aixuan Li; Yunqiu Lv,; Xinyu Tian; Deng-Ping Fan; and Nick Barnes

arXiv:2104.10127·cs.CV·January 2, 2023·25 cites

Generative Transformer for Accurate and Reliable Salient Object Detection

Yuxin Mao, Jing Zhang, Zhexiong Wan, Yuchao Dai, Aixuan Li, Yunqiu Lv,, Xinyu Tian, Deng-Ping Fan, and Nick Barnes

PDF

Open Access 2 Repos

TL;DR

This paper explores the use of transformers for salient object detection, demonstrating improved accuracy and reliability by addressing over-confidence issues with a novel uncertainty estimation method called iGAN, which leverages MCMC for input-dependent latent variables.

Contribution

The paper introduces a transformer-based framework for salient object detection and proposes iGAN, a novel uncertainty estimation model using MCMC, to improve reliability of predictions.

Findings

01

Transformers outperform CNNs in salient object detection.

02

iGAN effectively estimates predictive uncertainty.

03

Transformer + iGAN achieves accurate and reliable detection.

Abstract

Transformer, which originates from machine translation, is particularly powerful at modeling long-range dependencies. Currently, the transformer is making revolutionary progress in various vision tasks, leading to significant performance improvements compared with the convolutional neural network (CNN) based frameworks. In this paper, we conduct extensive research on exploiting the contributions of transformers for accurate and reliable salient object detection. For the former, we apply transformer to a deterministic model, and explain that the effective structure modeling and global context modeling abilities lead to its superior performance compared with the CNN based frameworks. For the latter, we observe that both CNN and transformer based frameworks suffer greatly from the over-confidence issue, where the models tend to generate wrong predictions with high confidence. To estimate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image Fusion Techniques · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · Softmax · Dense Connections · Vision Transformer