Delving into Motion-Aware Matching for Monocular 3D Object Tracking
Kuan-Chih Huang, Ming-Hsuan Yang, Yi-Hsuan Tsai

TL;DR
This paper introduces MoMA-M3T, a motion-aware framework for monocular 3D multi-object tracking that leverages motion cues and a motion transformer to improve tracking accuracy without re-training existing detectors.
Contribution
The paper proposes a novel motion-aware framework with a motion transformer and matching module, enhancing monocular 3D MOT by explicitly modeling object motion in feature space and temporal context.
Findings
Achieves competitive performance on nuScenes and KITTI datasets.
Flexible and can be integrated into existing detectors without re-training.
Demonstrates the importance of motion cues in monocular 3D tracking.
Abstract
Recent advances of monocular 3D object detection facilitate the 3D multi-object tracking task based on low-cost camera sensors. In this paper, we find that the motion cue of objects along different time frames is critical in 3D multi-object tracking, which is less explored in existing monocular-based approaches. In this paper, we propose a motion-aware framework for monocular 3D MOT. To this end, we propose MoMA-M3T, a framework that mainly consists of three motion-aware components. First, we represent the possible movement of an object related to all object tracklets in the feature space as its motion features. Then, we further model the historical object tracklet along the time frame in a spatial-temporal perspective via a motion transformer. Finally, we propose a motion-aware matching module to associate historical object tracklets and current observations as final tracking results.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
