3DTMDet: A Dual-Path Synergy Network of Transformer and SSM for 3D Object Detection in Point Clouds

Bingwen Qiu; Yuan Liu; Junqi Bai; Tong Jiang; Ben Liang; Fangzhou Chen; Xiubao Sui; Qian Chen

arXiv:2605.15546·cs.CV·May 18, 2026

3DTMDet: A Dual-Path Synergy Network of Transformer and SSM for 3D Object Detection in Point Clouds

Bingwen Qiu, Yuan Liu, Junqi Bai, Tong Jiang, Ben Liang, Fangzhou Chen, Xiubao Sui, Qian Chen

PDF

1 Repo

TL;DR

3DTMDet introduces a dual-path network combining state space models and Transformers to enhance 3D object detection in point clouds, effectively capturing global context and local details, especially for distant and occluded objects.

Contribution

The paper proposes the 3DHMT block and a voxel generation method, improving remote and occluded object detection by balancing global and local feature extraction.

Findings

01

Outperforms state-of-the-art detectors on KITTI and ONCE datasets.

02

Effectively captures global interactions and local geometric details.

03

Enhances detection of distant and occluded objects.

Abstract

A fundamental challenge in point cloud object detection lies in the conflict between the extreme sparsity of distant points and the need for remote context understanding. The existing methods typically use 1D serialization to expand the receptive field, which inevitably discards already scarce local geometric details and reduces detection of distant and small objects. To address this issue, we propose 3DTMDet, a novel detection network that synergistically combines state space models (Mamba) with Transformers. The core idea is to utilize SSM's linear complexity and advantages in long sequence modeling to effectively capture global interactions between sparse and distant points, while using Transformer modules with local attention to encode fine-grained geometric structures in local point sets, preserving accurate shape information. We propose the 3D Hybrid Mamba Transformer (3DHMT)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

QiuBingwen/3DTMDet
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.