Scanpath Prediction in Panoramic Videos via Expected Code Length Minimization
Mu Li, Kanglong Fan, Kede Ma

TL;DR
This paper introduces a novel scanpath prediction method for panoramic videos that minimizes expected code length using a probabilistic model, improving accuracy and realism without relying on ground-truth scanpaths.
Contribution
It proposes a lossy compression-inspired criterion and a probabilistic model conditioned on visual and historical data, enhancing scanpath prediction accuracy and realism.
Findings
Outperforms existing methods in quantitative accuracy
Generates more perceptually realistic scanpaths
Improves generalization to unseen datasets
Abstract
Predicting human scanpaths when exploring panoramic videos is a challenging task due to the spherical geometry and the multimodality of the input, and the inherent uncertainty and diversity of the output. Most previous methods fail to give a complete treatment of these characteristics, and thus are prone to errors. In this paper, we present a simple new criterion for scanpath prediction based on principles from lossy data compression. This criterion suggests minimizing the expected code length of quantized scanpaths in a training set, which corresponds to fitting a discrete conditional probability model via maximum likelihood. Specifically, the probability model is conditioned on two modalities: a viewport sequence as the deformation-reduced visual input and a set of relative historical scanpaths projected onto respective viewports as the aligned path input. The probability model is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
Methodsfail
