TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection
Zhipeng Luo, Gongjie Zhang, Changqing Zhou, Tianrui Liu, Shijian Lu,, Liang Pan

TL;DR
TransPillars introduces a transformer-based multi-frame 3D object detection method that effectively aggregates spatial-temporal features from point cloud sequences, significantly improving detection accuracy in autonomous driving scenarios.
Contribution
It proposes a novel hierarchical coarse-to-fine aggregation strategy and a deformable transformer variant to enhance multi-frame 3D detection performance.
Findings
Achieves state-of-the-art results on benchmark datasets.
Effectively captures object motion through multi-scale feature fusion.
Preserves instance details for accurate localization.
Abstract
3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics. However, most existing studies focus on single point cloud frames without harnessing the temporal information in point cloud sequences. In this paper, we design TransPillars, a novel transformer-based feature aggregation technique that exploits temporal features of consecutive point cloud frames for multi-frame 3D object detection. TransPillars aggregates spatial-temporal point cloud features from two perspectives. First, it fuses voxel-level features directly from multi-frame feature maps instead of pooled instance features to preserve instance details with contextual information that are essential to accurate object localization. Second, it introduces a hierarchical coarse-to-fine strategy to fuse multi-scale features progressively to effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization
