TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object   Detection

Zhipeng Luo; Gongjie Zhang; Changqing Zhou; Tianrui Liu; Shijian Lu,; Liang Pan

arXiv:2208.03141·cs.CV·August 8, 2022

TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection

Zhipeng Luo, Gongjie Zhang, Changqing Zhou, Tianrui Liu, Shijian Lu,, Liang Pan

PDF

Open Access 1 Video

TL;DR

TransPillars introduces a transformer-based multi-frame 3D object detection method that effectively aggregates spatial-temporal features from point cloud sequences, significantly improving detection accuracy in autonomous driving scenarios.

Contribution

It proposes a novel hierarchical coarse-to-fine aggregation strategy and a deformable transformer variant to enhance multi-frame 3D detection performance.

Findings

01

Achieves state-of-the-art results on benchmark datasets.

02

Effectively captures object motion through multi-scale feature fusion.

03

Preserves instance details for accurate localization.

Abstract

3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics. However, most existing studies focus on single point cloud frames without harnessing the temporal information in point cloud sequences. In this paper, we design TransPillars, a novel transformer-based feature aggregation technique that exploits temporal features of consecutive point cloud frames for multi-frame 3D object detection. TransPillars aggregates spatial-temporal point cloud features from two perspectives. First, it fuses voxel-level features directly from multi-frame feature maps instead of pooled instance features to preserve instance details with contextual information that are essential to accurate object localization. Second, it introduces a hierarchical coarse-to-fine strategy to fuse multi-scale features progressively to effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization