Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers
Bishoy Galoaa, Xiangyu Bai, Shayda Moezzi, Utsav Nandi, Sai Siddhartha Vivek Dhir Rangoju, Somaieh Amraee, Sarah Ostadabbas

TL;DR
LAPA introduces a transformer-based architecture for multi-camera point tracking that jointly reasons across views and time, improving accuracy and robustness in complex scenarios with occlusions.
Contribution
The paper presents a novel end-to-end transformer model that integrates appearance and geometric cues for multi-camera point tracking, avoiding classical triangulation.
Findings
Achieves 37.5% APD on TAPVid-3D-MC dataset.
Achieves 90.3% APD on PointOdyssey-MC dataset.
Outperforms existing methods in challenging multi-camera tracking scenarios.
Abstract
This paper presents LAPA (Look Around and Pay Attention), a novel end-to-end transformer-based architecture for multi-camera point tracking that integrates appearance-based matching with geometric constraints. Traditional pipelines decouple detection, association, and tracking, leading to error propagation and temporal inconsistency in challenging scenarios. LAPA addresses these limitations by leveraging attention mechanisms to jointly reason across views and time, establishing soft correspondences through a cross-view attention mechanism enhanced with geometric priors. Instead of relying on classical triangulation, we construct 3D point representations via attention-weighted aggregation, inherently accommodating uncertainty and partial observations. Temporal consistency is further maintained through a transformer decoder that models long-range dependencies, preserving identities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · 3D Shape Modeling and Analysis · Face recognition and analysis
