SpOT: Spatiotemporal Modeling for 3D Object Tracking
Colton Stearns, Davis Rempe, Jie Li, Rares Ambrus, Sergey Zakharov,, Vitor Guizilini, Yanchao Yang, Leonidas J Guibas

TL;DR
This paper introduces SpOT, a spatiotemporal modeling framework for 3D object tracking that leverages long-term history and physical priors to improve accuracy, achieving state-of-the-art results on major benchmarks.
Contribution
It reformulates 3D tracking as a spatiotemporal problem using sequences of points and bounding boxes, enhancing location and motion estimates with learned refinement.
Findings
Achieves state-of-the-art performance on Waymo benchmark.
Effectively encodes object permanence and temporal consistency.
Utilizes long-term historical data for improved tracking accuracy.
Abstract
3D multi-object tracking aims to uniquely and consistently identify all mobile entities through time. Despite the rich spatiotemporal information available in this setting, current 3D tracking methods primarily rely on abstracted information and limited history, e.g. single-frame object bounding boxes. In this work, we develop a holistic representation of traffic scenes that leverages both spatial and temporal information of the actors in the scene. Specifically, we reformulate tracking as a spatiotemporal problem by representing tracked objects as sequences of time-stamped points and bounding boxes over a long temporal history. At each timestamp, we improve the location and motion estimates of our tracked objects through learned refinement over the full sequence of object history. By considering time and space jointly, our representation naturally encodes fundamental physical priors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Autonomous Vehicle Technology and Safety
