Generative Point Tracking with Flow Matching

Mattie Tesfaldet; Adam W. Harley; Konstantinos G. Derpanis; Derek Nowrouzezahrai; Christopher Pal

arXiv:2510.20951·cs.CV·October 27, 2025

Generative Point Tracking with Flow Matching

Mattie Tesfaldet, Adam W. Harley, Konstantinos G. Derpanis, Derek Nowrouzezahrai, Christopher Pal

PDF

3 Reviews

TL;DR

Generative Point Tracker (GenPT) introduces a novel flow matching framework to model multi-modal point trajectories, improving accuracy in occluded and ambiguous scenarios by leveraging generative sampling and confidence-guided inference.

Contribution

The paper presents GenPT, a generative framework with flow matching for multi-modal trajectory modeling, outperforming discriminative models especially in occlusion scenarios.

Findings

01

State-of-the-art accuracy on PointOdyssey, Dynamic Replica, TAP-Vid benchmarks.

02

Effective multi-modality capture in point trajectories.

03

Enhanced occluded point tracking performance.

Abstract

Tracking a point through a video can be a challenging task due to uncertainty arising from visual obfuscations, such as appearance changes and occlusions. Although current state-of-the-art discriminative models excel in regressing long-term point trajectory estimates -- even through occlusions -- they are limited to regressing to a mean (or mode) in the presence of uncertainty, and fail to capture multi-modality. To overcome this limitation, we introduce Generative Point Tracker (GenPT), a generative framework for modelling multi-modal trajectories. GenPT is trained with a novel flow matching formulation that combines the iterative refinement of discriminative trackers, a window-dependent prior for cross-window consistency, and a variance schedule tuned specifically for point coordinates. We show how our model's generative capabilities can be leveraged to improve point trajectory…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

- GenPT can model and sample from multiple plausible trajectory candidates, particularly when tracking uncertainty is high due to occlusion. This translates directly to state-of-the-art tracking accuracy on occluded points. - The model effectively transitions between probabilistic and quasi-deterministic behavior. While always generative, its prediction variance tightly contracts (becoming nearly deterministic) when the tracked point is clearly visible and uniquely identifiable.

Weaknesses

- There is a substantial and recurring performance gap between the Oracle scores (the model's maximum potential) and the Greedy scores (the model's actual performance when relying on its confidence). This fundamental disconnect means the model is poor at judging the quality of the trajectories it generates, limiting the real-world utility of its multi-modality. - The advertised speed advantage (2x faster than CoTracker3) is strictly limited to generating a single sample. To achieve the demonstra

Reviewer 02Rating 4Confidence 4

Strengths

1. This paper introduces the first generative point tracker trained using a modified flow-matching objective for trajectories, extending generative modeling concepts to the task of point tracking. 2. The authors design three key modules: iterative refinement, window-dependent prior, and variance schedule. These components are well-motivated and thoroughly ablated.

Weaknesses

1. Point tracking is inherently a deterministic problem, so a multi-modal approach may not be well-suited for this task. 2. The improvements of this model mainly target occluded points. However, the objective function used in models such as CoTracker3 or other similar approaches is typically L=Huber_loss(predicted point,ground truth point)×is_visible_gt(this point) In other words, these models are not explicitly designed to predict occluded points. 3. The greedy search strategy requires running

Reviewer 03Rating 2Confidence 5

Strengths

- The paper tackles a genuine limitation of current discriminative point trackers, their inability to represent uncertainty and multimodal hypotheses in ambiguous or occluded regions. - The authors provide comprehensive comparisons across several datasets

Weaknesses

### Lack of generative insight Although the paper positions itself as a generative reformulation of tracking, the actual mechanism remains deterministic iterative optimization under Gaussian perturbation, not a generative process. - In generative models (diffusion or rectified flow), the model learns to map **pure noise --> data samples**, learning meaningful dynamics along a linear trajectory in data space. - In GenPT, the model learns **query + noise --> correspondence**, where the starting po

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.