LAMP: Localization Aware Multi-camera People Tracking in Metric 3D World
Nan Yang, Julian Straub, Fan Zhang, Richard Newcombe, Jakob Engel, Lingni Ma

TL;DR
LAMP is a novel multi-camera 3D human tracking framework that leverages device motion and calibration to improve egocentric tracking accuracy using a two-step process involving 3D conversion and spatio-temporal modeling.
Contribution
It introduces a simple, end-to-end framework that disentangles observer and target motion, enabling effective multi-view, localized 3D human tracking in egocentric scenarios.
Findings
Achieves state-of-the-art results on monocular benchmarks.
Significantly outperforms baselines in egocentric multi-camera settings.
Effectively leverages multi-view, asynchronous camera data.
Abstract
Tracking 3D human motion from egocentric multi-camera headset is challenged by severe egomotion, partial visibility or occlusions and lack of training data. Existing methods designed for monocular video often require static or slowly-moving cameras and cannot efficiently leverage multi-view, calibrated and localized input. This makes them brittle and prone to fail on dynamic egocentric captures. We propose LAMP (Localization Aware Multi-camera People Tracking): a novel, simple framework to solve this via early disentanglement of observer and target motion. LAMP introduces a two-step process. First, we leverage the known device 6 DoF motion and calibration to convert detected 2D body keypoints from all cameras over a temporal window into a unified 3D world reference frame. Second, an end-to-end-trained spatio-temporal transformer fits 3D human motion directly to this 3D ray cloud. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
