BOTT: Box Only Transformer Tracker for 3D Object Tracking

Lubing Zhou; Xiaoli Meng; Yiluan Guo; Jiong Yang

arXiv:2308.08753·cs.CV·August 21, 2023

BOTT: Box Only Transformer Tracker for 3D Object Tracking

Lubing Zhou, Xiaoli Meng, Yiluan Guo, Jiong Yang

PDF

Open Access

TL;DR

This paper introduces BOTT, a transformer-based method for 3D object tracking that learns to link objects across frames using global box embeddings, reducing engineering complexity and achieving competitive results.

Contribution

BOTT is the first transformer-based approach to directly learn 3D object linking from box data, eliminating the need for handcrafted motion models.

Findings

01

Achieves 69.9 and 66.7 AMOTA on nuScenes validation and test.

02

Attains 56.45 and 59.57 MOTA L2 on Waymo datasets.

03

Seamlessly supports online and offline tracking modes.

Abstract

Tracking 3D objects is an important task in autonomous driving. Classical Kalman Filtering based methods are still the most popular solutions. However, these methods require handcrafted designs in motion modeling and can not benefit from the growing data amounts. In this paper, Box Only Transformer Tracker (BOTT) is proposed to learn to link 3D boxes of the same object from the different frames, by taking all the 3D boxes in a time window as input. Specifically, transformer self-attention is applied to exchange information between all the boxes to learn global-informative box embeddings. The similarity between these learned embeddings can be used to link the boxes of the same object. BOTT can be used for both online and offline tracking modes seamlessly. Its simplicity enables us to significantly reduce engineering efforts required by traditional Kalman Filtering based methods.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Human Pose and Action Recognition

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Absolute Position Encodings · Residual Connection