TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction
Bahar Aydemir, Ludo Hoffstetter, Tong Zhang, Mathieu Salzmann, Sabine, S\"usstrunk

TL;DR
This paper introduces TempSAL, a novel deep saliency prediction model that incorporates temporal gaze shift information to improve the accuracy of saliency maps during image observation.
Contribution
TempSAL is the first model to explicitly learn and utilize temporal attention patterns for saliency prediction, outperforming existing models on benchmark datasets.
Findings
Outperforms state-of-the-art models on SALICON benchmark
Effectively models temporal gaze shifts during image viewing
Provides publicly available code for reproducibility
Abstract
Deep saliency prediction algorithms complement the object recognition features, they typically rely on additional information, such as scene context, semantic relationships, gaze direction, and object dissimilarity. However, none of these models consider the temporal nature of gaze shifts during image observation. We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals by exploiting human temporal attention patterns. Our approach locally modulates the saliency predictions by combining the learned temporal maps. Our experiments show that our method outperforms the state-of-the-art models, including a multi-duration saliency model, on the SALICON benchmark. Our code will be publicly available on GitHub.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection
MethodsNone
