TL;DR
This paper introduces MDDCNet, a novel traffic object detection model that combines deformable dilated convolutions with Mamba blocks for improved multi-scale detection in complex scenes.
Contribution
It proposes a hybrid backbone with multi-scale deformable dilated convolutions and Mamba blocks, along with a channel-enhanced feed-forward network and an attention-based feature pyramid network.
Findings
Outperforms existing detectors on benchmark datasets
Effectively captures small objects with local details
Enhances multi-scale feature fusion and interaction
Abstract
In a real-world traffic scenario, varying-scale objects are usually distributed in a cluttered background, which poses great challenges to accurate detection. Although current Mamba-based methods can efficiently model long-range dependencies, they still struggle to capture small objects with abundant local details, which hinders joint modeling of local structures and global semantics. Moreover, state-space models exhibit limited hierarchical feature representation and weak cross-scale interaction due to flat sequential modeling and insufficient spatial inductive biases, leading to sub-optimal performance in complex scenes. To address these issues, we propose a Mamba with Deformable Dilated Convolutions Network (MDDCNet) for accurate traffic object detection in this study. In MDDCNet, a well-designed hybrid backbone with successive Multi-Scale Deformable Dilated Convolution (MSDDC)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
