Can Visual Foundation Models Achieve Long-term Point Tracking?

G\"orkay Aydemir; Weidi Xie; Fatma G\"uney

arXiv:2408.13575·cs.CV·August 27, 2024

Can Visual Foundation Models Achieve Long-term Point Tracking?

G\"orkay Aydemir, Weidi Xie, Fatma G\"uney

PDF

Open Access

TL;DR

This paper evaluates the ability of large-scale visual foundation models to perform long-term point tracking, revealing their potential and limitations in complex environments without extensive training.

Contribution

It systematically assesses the geometric awareness of foundation models like Stable Diffusion and DINOv2 for long-term correspondence tasks, including zero-shot and fine-tuning scenarios.

Findings

01

Stable Diffusion and DINOv2 excel in zero-shot geometric correspondence.

02

DINOv2 performs comparably to supervised models after fine-tuning.

03

Foundation models show promise as initialization for correspondence learning.

Abstract

Large-scale vision foundation models have demonstrated remarkable success across various tasks, underscoring their robust generalization capabilities. While their proficiency in two-view correspondence has been explored, their effectiveness in long-term correspondence within complex environments remains unexplored. To address this, we evaluate the geometric awareness of visual foundation models in the context of point tracking: (i) in zero-shot settings, without any training; (ii) by probing with low-capacity layers; (iii) by fine-tuning with Low Rank Adaptation (LoRA). Our findings indicate that features from Stable Diffusion and DINOv2 exhibit superior geometric correspondence abilities in zero-shot settings. Furthermore, DINOv2 achieves performance comparable to supervised models in adaptation settings, demonstrating its potential as a strong initialization for correspondence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics

MethodsDiffusion