TL;DR
This paper introduces a novel learning-based model for predicting visual saliency in stereoscopic 3D videos by integrating low-level features, high-level cues, and depth information, validated through eye-tracking experiments.
Contribution
It presents a new stereoscopic saliency prediction model that combines multiple features and a random forest fusion approach, specifically designed for 3D video content.
Findings
Achieves competitive performance with state-of-the-art models
Incorporates depth and high-level cues for improved accuracy
Validated with eye-tracking data from 24 subjects
Abstract
Over the past decade, many computational saliency prediction models have been proposed for 2D images and videos. Considering that the human visual system has evolved in a natural 3D environment, it is only natural to want to design visual attention models for 3D content. Existing monocular saliency models are not able to accurately predict the attentive regions when applied to 3D image/video content, as they do not incorporate depth information. This paper explores stereoscopic video saliency prediction by exploiting both low-level attributes such as brightness, color, texture, orientation, motion, and depth, as well as high-level cues such as face, person, vehicle, animal, text, and horizon. Our model starts with a rough segmentation and quantifies several intuitive observations such as the effects of visual discomfort level, depth abruptness, motion acceleration, elements of surprise,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
