Temporal Saliency Adaptation in Egocentric Videos
Panagiotis Linardos, Eva Mohedano, Monica Cherto, Cathal Gurrin and, Xavier Giro-i-Nieto

TL;DR
This paper extends image saliency prediction models to egocentric videos by incorporating temporal adaptation, demonstrating benefits in specific viewing conditions and providing new saliency datasets and tools.
Contribution
It introduces a method for adapting image saliency models to egocentric videos using convolutional and conv-LSTM layers, and provides a new dataset and saliency maps for egocentric video analysis.
Findings
Temporal adaptation improves saliency prediction when viewers are stationary.
Adding conv-LSTM layers enhances the model's ability to capture temporal dynamics.
Saliency maps for the EPIC Kitchens dataset are publicly available.
Abstract
This work adapts a deep neural model for image saliency prediction to the temporal domain of egocentric video. We compute the saliency map for each video frame, firstly with an off-the-shelf model trained from static images, secondly by adding a a convolutional or conv-LSTM layers trained with a dataset for video saliency prediction. We study each configuration on EgoMon, a new dataset made of seven egocentric videos recorded by three subjects in both free-viewing and task-driven set ups. Our results indicate that the temporal adaptation is beneficial when the viewer is not moving and observing the scene from a narrow field of view. Encouraged by this observation, we compute and publish the saliency maps for the EPIC Kitchens dataset, in which viewers are cooking. Source code and models available at https://imatge-upc.github.io/saliency-2018-videosalgan/
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment
