Noise-Aware Video Saliency Prediction
Ekta Prashnani, Orazio Gallo, Joohwan Kim, Josef Spjut, Pradeep Sen,, Iuri Frosio

TL;DR
This paper introduces a noise-aware training paradigm for video saliency prediction that effectively handles uncertain gaze data, especially with limited training samples, and presents a new dataset with rich temporal semantics.
Contribution
We propose a novel noise-aware training method for video saliency prediction that accounts for gaze data uncertainty and introduce a new dataset with multiple gaze attractors per frame.
Findings
NAT improves saliency prediction accuracy with limited data
The method is effective across various models and datasets
The new dataset contains rich temporal semantics and multiple gaze attractors
Abstract
We tackle the problem of predicting saliency maps for videos of dynamic scenes. We note that the accuracy of the maps reconstructed from the gaze data of a fixed number of observers varies with the frame, as it depends on the content of the scene. This issue is particularly pressing when a limited number of observers are available. In such cases, directly minimizing the discrepancy between the predicted and measured saliency maps, as traditional deep-learning methods do, results in overfitting to the noisy data. We propose a noise-aware training (NAT) paradigm that quantifies and accounts for the uncertainty arising from frame-specific gaze data inaccuracy. We show that NAT is especially advantageous when limited training data is available, with experiments across different models, loss functions, and datasets. We also introduce a video game-based saliency dataset, with rich temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
