AllTracker: Efficient Dense Point Tracking at High Resolution

Adam W. Harley; Yang You; Xinglong Sun; Yang Zheng; Nikhil Raghuraman; Yunqi Gu; Sheldon Liang; Wen-Hsuan Chu; Achal Dave; Pavel Tokmakov; Suya You; Rares Ambrus; Katerina Fragkiadaki; Leonidas J. Guibas

arXiv:2506.07310·cs.CV·August 5, 2025

AllTracker: Efficient Dense Point Tracking at High Resolution

Adam W. Harley, Yang You, Xinglong Sun, Yang Zheng, Nikhil Raghuraman, Yunqi Gu, Sheldon Liang, Wen-Hsuan Chu, Achal Dave, Pavel Tokmakov, Suya You, Rares Ambrus, Katerina Fragkiadaki, Leonidas J. Guibas

PDF

Open Access 3 Models

TL;DR

AllTracker is a novel high-resolution dense point tracking model that estimates long-range correspondences across video frames, combining optical flow and point tracking techniques for superior accuracy and efficiency.

Contribution

We introduce AllTracker, a new architecture that achieves dense, high-resolution point tracking over many frames, trained jointly on optical flow and point tracking datasets.

Findings

01

State-of-the-art accuracy on high-resolution point tracking

02

Efficient model with 16 million parameters

03

Effective joint training on optical flow and point tracking datasets

Abstract

We introduce AllTracker: a model that estimates long-range point tracks by way of estimating the flow field between a query frame and every other frame of a video. Unlike existing point tracking methods, our approach delivers high-resolution and dense (all-pixel) correspondence fields, which can be visualized as flow maps. Unlike existing optical flow methods, our approach corresponds one frame to hundreds of subsequent frames, rather than just the next frame. We develop a new architecture for this task, blending techniques from existing work in optical flow and point tracking: the model performs iterative inference on low-resolution grids of correspondence estimates, propagating information spatially via 2D convolution layers, and propagating information temporally via pixel-aligned attention layers. The model is fast and parameter-efficient (16 million parameters), and delivers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Human Pose and Action Recognition