InterTrack: Interaction Transformer for 3D Multi-Object Tracking
John Willes, Cody Reading, Steven L. Waslander

TL;DR
InterTrack introduces an Interaction Transformer that enhances 3D multi-object tracking by leveraging global attention for better data association, achieving top performance on the nuScenes benchmark.
Contribution
The paper presents a novel Interaction Transformer architecture that improves 3D MOT by incorporating global contextual information through attention mechanisms.
Findings
Significant performance improvements on nuScenes 3D MOT benchmark.
Highest overall AMOTA among CenterPoint detection-based methods.
Particularly effective for small and clustered objects.
Abstract
3D multi-object tracking (MOT) is a key problem for autonomous vehicles, required to perform well-informed motion planning in dynamic environments. Particularly for densely occupied scenes, associating existing tracks to new detections remains challenging as existing systems tend to omit critical contextual information. Our proposed solution, InterTrack, introduces the Interaction Transformer for 3D MOT to generate discriminative object representations for data association. We extract state and shape features for each track and detection, and efficiently aggregate global information via attention. We then perform a learned regression on each track/detection feature pair to estimate affinities, and use a robust two-stage data association and track management approach to produce the final tracks. We validate our approach on the nuScenes 3D MOT benchmark, where we observe significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Autonomous Vehicle Technology and Safety · Fire Detection and Safety Systems
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Position-Wise Feed-Forward Layer · Residual Connection · Adam · Softmax · Dropout
