LiDAR MOT-DETR: A LiDAR-based Two-Stage Transformer for 3D Multiple Object Tracking

Martha Teiko Teye; Ori Maoz; Matthias Rottmann

arXiv:2505.12753·cs.CV·September 25, 2025

LiDAR MOT-DETR: A LiDAR-based Two-Stage Transformer for 3D Multiple Object Tracking

Martha Teiko Teye, Ori Maoz, Matthias Rottmann

PDF

TL;DR

LiDAR MOT-DETR introduces a two-stage transformer approach for 3D multi-object tracking that refines detections and maintains object identities over time, outperforming existing models on nuScenes and KITTI datasets.

Contribution

The paper proposes a novel two-stage transformer framework for LiDAR-based 3D tracking, combining detection refinement and temporal association using attention mechanisms.

Findings

01

Online mode achieves higher accuracy than baseline and SOTA models.

02

Offline mode improves tracking precision by 3 percentage points.

03

Model demonstrates strong performance on nuScenes and KITTI datasets.

Abstract

Multi-object tracking from LiDAR point clouds presents unique challenges due to the sparse and irregular nature of the data, compounded by the need for temporal coherence across frames. Traditional tracking systems often rely on hand-crafted features and motion models, which can struggle to maintain consistent object identities in crowded or fast-moving scenes. We present a lidar-based two-staged DETR inspired transformer; a smoother and tracker. The smoother stage refines lidar object detections, from any off-the-shelf detector, across a moving temporal window. The tracker stage uses a DETR-based attention block to maintain tracks across time by associating tracked objects with the refined detections using the point cloud as context. The model is trained on the datasets nuScenes and KITTI in both online and offline (forward peeking) modes demonstrating strong performance across metrics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Dropout · Adam · Multi-Head Attention · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer