DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single   Video

Narek Tumanyan; Assaf Singer; Shai Bagon; Tali Dekel

arXiv:2403.14548·cs.CV·July 12, 2024·3 cites

DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video

Narek Tumanyan, Assaf Singer, Shai Bagon, Tali Dekel

PDF

Open Access

TL;DR

DINO-Tracker introduces a novel self-supervised framework that combines test-time training with pre-trained DINO-ViT features for long-term dense point tracking in videos, achieving state-of-the-art results.

Contribution

The paper proposes a new end-to-end self-supervised tracking method that refines DINO features during test time for improved long-term video tracking.

Findings

01

Outperforms existing self-supervised tracking methods.

02

Competitive with supervised trackers on benchmarks.

03

Excels in long-term occlusion scenarios.

Abstract

We present DINO-Tracker -- a new framework for long-term dense tracking in video. The pillar of our approach is combining test-time training on a single video, with the powerful localized semantic features learned by a pre-trained DINO-ViT model. Specifically, our framework simultaneously adopts DINO's features to fit to the motion observations of the test video, while training a tracker that directly leverages the refined features. The entire framework is trained end-to-end using a combination of self-supervised losses, and regularization that allows us to retain and benefit from DINO's semantic prior. Extensive evaluation demonstrates that our method achieves state-of-the-art results on known benchmarks. DINO-tracker significantly outperforms self-supervised methods and is competitive with state-of-the-art supervised trackers, while outperforming them in challenging cases of tracking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · CCD and CMOS Imaging Sensors · Image Processing Techniques and Applications