CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer
Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, and Min-Jian Zhao, Jieping Ye

TL;DR
This paper introduces CT3D++ and CT3D, innovative frameworks for 3D object detection from point clouds that leverage keypoint-induced channel-wise transformers to improve accuracy and efficiency with minimal manual design.
Contribution
The paper presents CT3D++ with geometric and semantic fusion embedding and a point-to-key encoder, advancing 3D detection performance and computational efficiency.
Findings
Achieves state-of-the-art results on KITTI and Waymo datasets.
Introduces a novel keypoint-induced channel-wise transformer architecture.
Reduces computational cost while improving detection accuracy.
Abstract
The field of 3D object detection from point clouds is rapidly advancing in computer vision, aiming to accurately and efficiently detect and localize objects in three-dimensional space. Current 3D detectors commonly fall short in terms of flexibility and scalability, with ample room for advancements in performance. In this paper, our objective is to address these limitations by introducing two frameworks for 3D object detection with minimal hand-crafted design. Firstly, we propose CT3D, which sequentially performs raw-point-based embedding, a standard Transformer encoder, and a channel-wise decoder for point features within each proposal. Secondly, we present an enhanced network called CT3D++, which incorporates geometric and semantic fusion-based embedding to extract more valuable and comprehensive proposal-aware information. Additionally, CT3D ++ utilizes a point-to-key bidirectional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Industrial Vision Systems and Defect Detection
MethodsRegion Proposal Network · Residual Connection · Softmax · CT3D · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer
