Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers

Bishoy Galoaa; Xiangyu Bai; Shayda Moezzi; Utsav Nandi; Sai Siddhartha Vivek Dhir Rangoju; Somaieh Amraee; Sarah Ostadabbas

arXiv:2512.04213·cs.CV·December 5, 2025

Look Around and Pay Attention: Multi-camera Point Tracking Reimagined with Transformers

Bishoy Galoaa, Xiangyu Bai, Shayda Moezzi, Utsav Nandi, Sai Siddhartha Vivek Dhir Rangoju, Somaieh Amraee, Sarah Ostadabbas

PDF

Open Access

TL;DR

LAPA introduces a transformer-based architecture for multi-camera point tracking that jointly reasons across views and time, improving accuracy and robustness in complex scenarios with occlusions.

Contribution

The paper presents a novel end-to-end transformer model that integrates appearance and geometric cues for multi-camera point tracking, avoiding classical triangulation.

Findings

01

Achieves 37.5% APD on TAPVid-3D-MC dataset.

02

Achieves 90.3% APD on PointOdyssey-MC dataset.

03

Outperforms existing methods in challenging multi-camera tracking scenarios.

Abstract

This paper presents LAPA (Look Around and Pay Attention), a novel end-to-end transformer-based architecture for multi-camera point tracking that integrates appearance-based matching with geometric constraints. Traditional pipelines decouple detection, association, and tracking, leading to error propagation and temporal inconsistency in challenging scenarios. LAPA addresses these limitations by leveraging attention mechanisms to jointly reason across views and time, establishing soft correspondences through a cross-view attention mechanism enhanced with geometric priors. Instead of relying on classical triangulation, we construct 3D point representations via attention-weighted aggregation, inherently accommodating uncertainty and partial observations. Temporal consistency is further maintained through a transformer decoder that models long-range dependencies, preserving identities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · 3D Shape Modeling and Analysis · Face recognition and analysis