It's a Matter of Time: Three Lessons on Long-Term Motion for Perception

Willem Davison; Xinyue Hao; Laura Sevilla-Lara

arXiv:2602.14705·cs.CV·February 17, 2026

It's a Matter of Time: Three Lessons on Long-Term Motion for Perception

Willem Davison, Xinyue Hao, Laura Sevilla-Lara

PDF

Open Access

TL;DR

This paper explores the significance of long-term motion information in perception, demonstrating its advantages over image data in understanding actions, generalization, and computational efficiency across various perceptual tasks.

Contribution

It provides three key lessons on the importance, generalization, and efficiency of long-term motion representations for perception tasks, highlighting their potential for future model design.

Findings

01

Long-term motion representations effectively understand actions, objects, and materials.

02

They outperform image representations in low-data and zero-shot scenarios.

03

Motion representations offer a better accuracy-GFLOPs trade-off than standard video models.

Abstract

Temporal information has long been considered to be essential for perception. While there is extensive research on the role of image information for perceptual tasks, the role of the temporal dimension remains less well understood: What can we learn about the world from long-term motion information? What properties does long-term motion information have for visual learning? We leverage recent success in point-track estimation, which offers an excellent opportunity to learn temporal representations and experiment on a variety of perceptual tasks. We draw 3 clear lessons: 1) Long-term motion representations contain information to understand actions, but also objects, materials, and spatial information, often even better than images. 2) Long-term motion representations generalize far better than image representations in low-data settings and in zero-shot tasks. 3) The very low…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Face Recognition and Perception