Semantic Segmentation Enhanced Transformer Model for Human Attention Prediction
Shuo Zhang

TL;DR
This paper introduces a Transformer-based model with semantic segmentation for saliency prediction, capturing global image features and human visual perception cues, leading to improved attention prediction accuracy.
Contribution
It proposes a multi-task Transformer model with a novel attention module that integrates semantic segmentation into saliency prediction, enhancing global feature learning.
Findings
Achieves competitive performance with state-of-the-art methods.
Demonstrates the effectiveness of multi-task learning with semantic segmentation.
Validates the approach through experiments on human gaze prediction.
Abstract
Saliency Prediction aims to predict the attention distribution of human eyes given an RGB image. Most of the recent state-of-the-art methods are based on deep image feature representations from traditional CNNs. However, the traditional convolution could not capture the global features of the image well due to its small kernel size. Besides, the high-level factors which closely correlate to human visual perception, e.g., objects, color, light, etc., are not considered. Inspired by these, we propose a Transformer-based method with semantic segmentation as another learning objective. More global cues of the image could be captured by Transformer. In addition, simultaneously learning the object segmentation simulates the human visual perception, which we would verify in our investigation of human gaze control in cognitive science. We build an extra decoder for the subtask and the multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Infrared Target Detection Methodologies
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Adam · Position-Wise Feed-Forward Layer · Softmax · Linear Layer · Absolute Position Encodings · Dropout · Label Smoothing
