Scanpath Prediction in Panoramic Videos via Expected Code Length Minimization

Mu Li; Kanglong Fan; Kede Ma

arXiv:2305.02536·cs.CV·August 18, 2025·2 cites

Scanpath Prediction in Panoramic Videos via Expected Code Length Minimization

Mu Li, Kanglong Fan, Kede Ma

PDF

Open Access

TL;DR

This paper introduces a novel scanpath prediction method for panoramic videos that minimizes expected code length using a probabilistic model, improving accuracy and realism without relying on ground-truth scanpaths.

Contribution

It proposes a lossy compression-inspired criterion and a probabilistic model conditioned on visual and historical data, enhancing scanpath prediction accuracy and realism.

Findings

01

Outperforms existing methods in quantitative accuracy

02

Generates more perceptually realistic scanpaths

03

Improves generalization to unseen datasets

Abstract

Predicting human scanpaths when exploring panoramic videos is a challenging task due to the spherical geometry and the multimodality of the input, and the inherent uncertainty and diversity of the output. Most previous methods fail to give a complete treatment of these characteristics, and thus are prone to errors. In this paper, we present a simple new criterion for scanpath prediction based on principles from lossy data compression. This criterion suggests minimizing the expected code length of quantized scanpaths in a training set, which corresponds to fitting a discrete conditional probability model via maximum likelihood. Specifically, the probability model is conditioned on two modalities: a viewport sequence as the deformation-reduced visual input and a set of relative historical scanpaths projected onto respective viewports as the aligned path input. The probability model is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques

Methodsfail