WinMamba: Multi-Scale Shifted Windows in State Space Model for 3D Object Detection
Longhui Zheng, Qiming Xia, Xiaolu Chen, Zhaoliang Liu, Chenglu Wen

TL;DR
WinMamba introduces a multi-scale shifted window approach in a state space model to improve 3D object detection by capturing long-range dependencies efficiently and preserving spatial information across resolutions.
Contribution
The paper proposes WinMamba, a novel multi-scale shifted window module within a state space model for 3D detection, enhancing spatial feature encoding and contextual understanding.
Findings
WinMamba outperforms baseline models on KITTI and Waymo datasets.
The WSF and AWF modules significantly improve detection accuracy.
Extensive ablation confirms the effectiveness of multi-scale and positional encoding strategies.
Abstract
3D object detection is critical for autonomous driving, yet it remains fundamentally challenging to simultaneously maximize computational efficiency and capture long-range spatial dependencies. We observed that Mamba-based models, with their linear state-space design, capture long-range dependencies at lower cost, offering a promising balance between efficiency and accuracy. However, existing methods rely on axis-aligned scanning within a fixed window, inevitably discarding spatial information. To address this problem, we propose WinMamba, a novel Mamba-based 3D feature-encoding backbone composed of stacked WinMamba blocks. To enhance the backbone with robust multi-scale representation, the WinMamba block incorporates a window-scale-adaptive module that compensates voxel features across varying resolutions during sampling. Meanwhile, to obtain rich contextual cues within the linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Autonomous Vehicle Technology and Safety
