STT: Stateful Tracking with Transformers for Autonomous Driving

Longlong Jing; Ruichi Yu; Xu Chen; Zhengli Zhao; Shiwei Sheng; Colin; Graber; Qi Chen; Qinru Li; Shangxuan Wu; Han Deng; Sangjin Lee; Chris; Sweeney; Qiurui He; Wei-Chih Hung; Tong He; Xingyi Zhou; Farshid Moussavi,; Zijian Guo; Yin Zhou; Mingxing Tan; Weilong Yang; Congcong Li

arXiv:2405.00236·cs.RO·May 2, 2024

STT: Stateful Tracking with Transformers for Autonomous Driving

Longlong Jing, Ruichi Yu, Xu Chen, Zhengli Zhao, Shiwei Sheng, Colin, Graber, Qi Chen, Qinru Li, Shangxuan Wu, Han Deng, Sangjin Lee, Chris, Sweeney, Qiurui He, Wei-Chih Hung, Tong He, Xingyi Zhou, Farshid Moussavi,, Zijian Guo, Yin Zhou, Mingxing Tan, Weilong Yang, Congcong Li

PDF

Open Access

TL;DR

STT is a Transformer-based model for autonomous driving that jointly tracks objects and estimates their states accurately, extending standard metrics to better evaluate combined performance.

Contribution

The paper introduces STT, a novel Transformer-based framework that jointly performs object tracking and state estimation, with new metrics for comprehensive evaluation.

Findings

01

Achieves competitive real-time performance on Waymo dataset.

02

Joint optimization improves both tracking accuracy and state estimation.

03

Proposes new metrics S-MOTA and MOTPS for better assessment.

Abstract

Tracking objects in three-dimensional space is critical for autonomous driving. To ensure safety while driving, the tracker must be able to reliably track objects across frames and accurately estimate their states such as velocity and acceleration in the present. Existing works frequently focus on the association task while either neglecting the model performance on state estimation or deploying complex heuristics to predict the states. In this paper, we propose STT, a Stateful Tracking model built with Transformers, that can consistently track objects in the scenes while also predicting their states accurately. STT consumes rich appearance, geometry, and motion signals through long term history of detections and is jointly optimized for both data association and state estimation tasks. Since the standard tracking metrics like MOTA and MOTP do not capture the combined performance of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Human-Automation Interaction and Safety

MethodsFocus