TAPTRv2: Attention-based Position Update Improves Tracking Any Point
Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Feng Li, Tianhe Ren, Bohan Li, Lei Zhang

TL;DR
TAPTRv2 introduces an attention-based position update mechanism that enhances transformer-based tracking of any point, eliminating reliance on cost-volume and achieving state-of-the-art results.
Contribution
It proposes a novel attention-based position update operation using key-aware deformable attention, improving tracking accuracy and efficiency over previous TAPTR models.
Findings
Surpasses TAPTR in performance on multiple datasets
Removes the need for cost-volume computation
Achieves state-of-the-art tracking accuracy
Abstract
In this paper, we present TAPTRv2, a Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DEtection TRansformer (DETR) and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. TAPTRv2 improves TAPTR by addressing a critical issue regarding its reliance on cost-volume,which contaminates the point query\'s content feature and negatively impacts both visibility prediction and cost-volume computation. In TAPTRv2, we propose a novel attention-based position update (APU) operation and use key-aware deformable attention to realize. For each query, this operation uses key-aware attention weights to combine their corresponding deformable sampling positions to predict a new query position. This design is based on the observation that local attention is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Robotics and Sensor-Based Localization · Gaze Tracking and Assistive Technology
MethodsSoftmax · Attention Is All You Need
