Seeing without Pixels: Perception from Camera Trajectories

Zihui Xue; Kristen Grauman; Dima Damen; Andrew Zisserman; Tengda Han

arXiv:2511.21681·cs.CV·April 3, 2026

Seeing without Pixels: Perception from Camera Trajectories

Zihui Xue, Kristen Grauman, Dima Damen, Andrew Zisserman, Tengda Han

PDF

TL;DR

This paper demonstrates that camera trajectories alone can effectively encode video content, enabling various perception tasks without relying on pixel data, through a novel contrastive learning approach.

Contribution

It introduces CamFormer, a new encoder that aligns camera trajectories with language, revealing their rich informational content for video understanding.

Findings

01

Camera trajectories are highly informative for video content recognition.

02

CamFormer embeddings perform well across diverse downstream tasks.

03

Representations are robust across different camera pose estimation methods.

Abstract

Can one perceive a video's content without seeing its pixels, just from the camera trajectory-the path it carves through space? This paper is the first to systematically investigate this seemingly implausible question. Towards this end, we propose a contrastive learning framework to train CamFormer, a dedicated encoder that projects camera pose trajectories into a joint embedding space, aligning them with natural language. We find that, contrary to its apparent simplicity, the camera trajectory is a remarkably informative signal to uncover video content. In other words, "how you move" can indeed provide valuable cues about "what you are doing" (egocentric) or "observing" (exocentric). We demonstrate the versatility of our learned CamFormer embeddings on a diverse suite of downstream tasks, ranging from cross-modal alignment to classification and temporal analysis. Importantly, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.