Point Virtual Transformer
Veerain Sood, Bnalin, and Gaurav Pandey

TL;DR
PointViT introduces a transformer-based 3D object detection method that effectively fuses raw LiDAR points with virtual points derived from images, improving detection accuracy especially at long ranges.
Contribution
The paper proposes a novel fusion framework using transformers to jointly reason over real and virtual points, optimizing accuracy and efficiency in LiDAR-based detection.
Findings
Achieves 91.16% 3D AP on KITTI Car detection
Demonstrates effective fusion strategies balancing accuracy and computational cost
Outperforms previous methods in long-range object detection
Abstract
LiDAR-based 3D object detectors often struggle to detect far-field objects due to the sparsity of point clouds at long ranges, which limits the availability of reliable geometric cues. To address this, prior approaches augment LiDAR data with depth-completed virtual points derived from RGB images; however, directly incorporating all virtual points leads to increased computational cost and introduces challenges in effectively fusing real and virtual information. We present Point Virtual Transformer (PointViT), a transformer-based 3D object detection framework that jointly reasons over raw LiDAR points and selectively sampled virtual points. The framework examines multiple fusion strategies, ranging from early point-level fusion to BEV-based gated fusion, and analyses their trade-offs in terms of accuracy and efficiency. The fused point cloud is voxelized and encoded using sparse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Remote Sensing and LiDAR Applications
