SiamMo: Siamese Motion-Centric 3D Object Tracking

Yuxiang Yang; Yingqi Deng; Jing Zhang; Hongjie Gu; Zhekang Dong

arXiv:2408.01688·cs.CV·September 10, 2024

SiamMo: Siamese Motion-Centric 3D Object Tracking

Yuxiang Yang, Yingqi Deng, Jing Zhang, Hongjie Gu, Zhekang Dong

PDF

Open Access 1 Repo

TL;DR

SiamMo is a simple, motion-centric 3D object tracking method that improves accuracy and robustness by decoupling feature extraction from temporal fusion and integrating multi-scale motion features, outperforming state-of-the-art methods.

Contribution

Introducing SiamMo, a novel Siamese motion-centric tracking approach with a multi-scale feature aggregation and size-aware encoding, enhancing 3D tracking performance and robustness.

Findings

01

Achieves 90.1% precision on KITTI benchmark

02

Surpasses state-of-the-art tracking methods

03

Operates at 108 FPS with high robustness

Abstract

Current 3D single object tracking methods primarily rely on the Siamese matching-based paradigm, which struggles with textureless and incomplete LiDAR point clouds. Conversely, the motion-centric paradigm avoids appearance matching, thus overcoming these issues. However, its complex multi-stage pipeline and the limited temporal modeling capability of a single-stream architecture constrain its potential. In this paper, we introduce SiamMo, a novel and simple Siamese motion-centric tracking approach. Unlike the traditional single-stream architecture, we employ Siamese feature extraction for motion-centric tracking. This decouples feature extraction from temporal fusion, significantly enhancing tracking performance. Additionally, we design a Spatio-Temporal Feature Aggregation module to integrate Siamese features at multiple scales, capturing motion information effectively. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hdu-vrlab/siammo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings