BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View
Yuxiang Yang, Yingqi Deng, Mian Pan, Zheng-Jun Zha, Jing Zhang

TL;DR
BEVTrack introduces a simple, motion-based 3D object tracking method in Bird's-Eye View that adapts likelihood functions for diverse targets, achieving state-of-the-art accuracy and real-time performance.
Contribution
The paper presents BEVTrack, a novel, straightforward approach that estimates object motion directly in BEV and learns adaptive likelihood functions, improving robustness and accuracy over complex existing methods.
Findings
Achieves state-of-the-art results on KITTI, NuScenes, and Waymo datasets.
Operates at 200 FPS, enabling real-time tracking.
Effectively handles diverse target attributes with adaptive likelihood learning.
Abstract
3D Single Object Tracking (SOT) is a fundamental task in computer vision and plays a critical role in applications like autonomous driving. However, existing algorithms often involve complex designs and multiple loss functions, making model training and deployment challenging. Furthermore, their reliance on fixed probability distribution assumptions (e.g., Laplacian or Gaussian) hinders their ability to adapt to diverse target characteristics such as varying sizes and motion patterns, ultimately affecting tracking precision and robustness. To address these issues, we propose BEVTrack, a simple yet effective motion-based tracking method. BEVTrack directly estimates object motion in Bird's-Eye View (BEV) using a single regression loss. To enhance accuracy for targets with diverse attributes, it learns adaptive likelihood functions tailored to individual targets, avoiding the limitations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Infrared Target Detection Methodologies · Air Quality Monitoring and Forecasting
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
