TL;DR
3DTMDet introduces a dual-path network combining state space models and Transformers to enhance 3D object detection in point clouds, effectively capturing global context and local details, especially for distant and occluded objects.
Contribution
The paper proposes the 3DHMT block and a voxel generation method, improving remote and occluded object detection by balancing global and local feature extraction.
Findings
Outperforms state-of-the-art detectors on KITTI and ONCE datasets.
Effectively captures global interactions and local geometric details.
Enhances detection of distant and occluded objects.
Abstract
A fundamental challenge in point cloud object detection lies in the conflict between the extreme sparsity of distant points and the need for remote context understanding. The existing methods typically use 1D serialization to expand the receptive field, which inevitably discards already scarce local geometric details and reduces detection of distant and small objects. To address this issue, we propose 3DTMDet, a novel detection network that synergistically combines state space models (Mamba) with Transformers. The core idea is to utilize SSM's linear complexity and advantages in long sequence modeling to effectively capture global interactions between sparse and distant points, while using Transformer modules with local attention to encode fine-grained geometric structures in local point sets, preserving accurate shape information. We propose the 3D Hybrid Mamba Transformer (3DHMT)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
