CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise   Transformer

Hualian Sheng; Sijia Cai; Na Zhao; Bing Deng; Qiao Liang; and Min-Jian Zhao; Jieping Ye

arXiv:2406.08152·cs.CV·June 13, 2024

CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer

Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, and Min-Jian Zhao, Jieping Ye

PDF

Open Access 1 Repo

TL;DR

This paper introduces CT3D++ and CT3D, innovative frameworks for 3D object detection from point clouds that leverage keypoint-induced channel-wise transformers to improve accuracy and efficiency with minimal manual design.

Contribution

The paper presents CT3D++ with geometric and semantic fusion embedding and a point-to-key encoder, advancing 3D detection performance and computational efficiency.

Findings

01

Achieves state-of-the-art results on KITTI and Waymo datasets.

02

Introduces a novel keypoint-induced channel-wise transformer architecture.

03

Reduces computational cost while improving detection accuracy.

Abstract

The field of 3D object detection from point clouds is rapidly advancing in computer vision, aiming to accurately and efficiently detect and localize objects in three-dimensional space. Current 3D detectors commonly fall short in terms of flexibility and scalability, with ample room for advancements in performance. In this paper, our objective is to address these limitations by introducing two frameworks for 3D object detection with minimal hand-crafted design. Firstly, we propose CT3D, which sequentially performs raw-point-based embedding, a standard Transformer encoder, and a channel-wise decoder for point features within each proposal. Secondly, we present an enhanced network called CT3D++, which incorporates geometric and semantic fusion-based embedding to extract more valuable and comprehensive proposal-aware information. Additionally, CT3D ++ utilizes a point-to-key bidirectional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hlsheng1/ct3d-plusplus
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Industrial Vision Systems and Defect Detection

MethodsRegion Proposal Network · Residual Connection · Softmax · CT3D · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer