TAPVid-360: Tracking Any Point in 360 from Narrow Field of View Video
Finlay G.C. Hudson, James A.D. Gardner, William A.P. Smith

TL;DR
TAPVid-360 introduces a new task and dataset for tracking scene points outside the field of view in 360 videos, enabling panoramic scene understanding from narrow FOV videos.
Contribution
The paper proposes TAPVid-360, a novel task and dataset for predicting directions to scene points outside the FOV, and adapts existing models to this new challenge.
Findings
Baseline outperforms existing methods on the new benchmark
360 videos provide effective supervision for allocentric scene understanding
The approach enables tracking points beyond the visible field of view
Abstract
Humans excel at constructing panoramic mental models of their surroundings, maintaining object permanence and inferring scene structure beyond visible regions. In contrast, current artificial vision systems struggle with persistent, panoramic understanding, often processing scenes egocentrically on a frame-by-frame basis. This limitation is pronounced in the Track Any Point (TAP) task, where existing methods fail to track 2D points outside the field of view. To address this, we introduce TAPVid-360, a novel task that requires predicting the 3D direction to queried scene points across a video sequence, even when far outside the narrow field of view of the observed video. This task fosters learning allocentric scene representations without needing dynamic 4D ground truth scene models for training. Instead, we exploit 360 videos as a source of supervision, resampling them into narrow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Face recognition and analysis · Human Pose and Action Recognition
