TL;DR
This paper introduces a hierarchical 3D convolutional model for video saliency prediction that combines domain adaptation and domain-specific learning to improve accuracy and generalization across datasets.
Contribution
It proposes a novel hierarchical learning framework with domain adaptation techniques and domain-specific modules for enhanced video saliency prediction.
Findings
Achieves state-of-the-art accuracy on supervised saliency prediction.
Outperforms existing models on three of five metrics on DHF1K.
Enables unsupervised domain adaptation with competitive performance.
Abstract
In this work, we propose a 3D fully convolutional architecture for video saliency prediction that employs hierarchical supervision on intermediate maps (referred to as conspicuity maps) generated using features extracted at different abstraction levels. We provide the base hierarchical learning mechanism with two techniques for domain adaptation and domain-specific learning. For the former, we encourage the model to unsupervisedly learn hierarchical general features using gradient reversal at multiple scales, to enhance generalization capabilities on datasets for which no annotations are provided during training. As for domain specialization, we employ domain-specific operations (namely, priors, smoothing and batch normalization) by specializing the learned features on individual datasets in order to maximize performance. The results of our experiments show that the proposed model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
