Autogenic Language Embedding for Coherent Point Tracking
Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe, Chen, Wei Yang

TL;DR
This paper presents a novel visual tracking method that uses language embeddings to improve point correspondence and coherence in long video sequences, outperforming traditional visual-only approaches.
Contribution
Introduces autogenic language embedding for visual feature enhancement, learning text embeddings from visual data without explicit annotations, improving long-term point tracking.
Findings
Significantly improves tracking accuracy on benchmark datasets.
Enhances visual feature consistency with minimal computational overhead.
Outperforms existing visual-only tracking methods.
Abstract
Point tracking is a challenging task in computer vision, aiming to establish point-wise correspondence across long video sequences. Recent advancements have primarily focused on temporal modeling techniques to improve local feature similarity, often overlooking the valuable semantic consistency inherent in tracked points. In this paper, we introduce a novel approach leveraging language embeddings to enhance the coherence of frame-wise visual features related to the same object. Our proposed method, termed autogenic language embedding for visual feature enhancement, strengthens point correspondence in long-term sequences. Unlike existing visual-language schemes, our approach learns text embeddings from visual features through a dedicated mapping network, enabling seamless adaptation to various tracking tasks without explicit text annotations. Additionally, we introduce a consistency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
