Semantic Segmentation Enhanced Transformer Model for Human Attention   Prediction

Shuo Zhang

arXiv:2301.11022·cs.CV·January 27, 2023

Semantic Segmentation Enhanced Transformer Model for Human Attention Prediction

Shuo Zhang

PDF

Open Access

TL;DR

This paper introduces a Transformer-based model with semantic segmentation for saliency prediction, capturing global image features and human visual perception cues, leading to improved attention prediction accuracy.

Contribution

It proposes a multi-task Transformer model with a novel attention module that integrates semantic segmentation into saliency prediction, enhancing global feature learning.

Findings

01

Achieves competitive performance with state-of-the-art methods.

02

Demonstrates the effectiveness of multi-task learning with semantic segmentation.

03

Validates the approach through experiments on human gaze prediction.

Abstract

Saliency Prediction aims to predict the attention distribution of human eyes given an RGB image. Most of the recent state-of-the-art methods are based on deep image feature representations from traditional CNNs. However, the traditional convolution could not capture the global features of the image well due to its small kernel size. Besides, the high-level factors which closely correlate to human visual perception, e.g., objects, color, light, etc., are not considered. Inspired by these, we propose a Transformer-based method with semantic segmentation as another learning objective. More global cues of the image could be captured by Transformer. In addition, simultaneously learning the object segmentation simulates the human visual perception, which we would verify in our investigation of human gaze control in cognitive science. We build an extra decoder for the subtask and the multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Infrared Target Detection Methodologies

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Adam · Position-Wise Feed-Forward Layer · Softmax · Linear Layer · Absolute Position Encodings · Dropout · Label Smoothing