DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification
Junzhou Chen, Zirui Zhang, Jing Yu, Heqiang Huang, Ronghui Zhang,, Xuemiao Xu, Bin Sheng, Hong Yan

TL;DR
DSDFormer combines Transformer and Mamba architectures with a novel attention mechanism and an unsupervised label refinement method to improve real-time, accurate driver distraction detection, addressing noisy labels and capturing both global and local features.
Contribution
The paper introduces DSDFormer, a new framework integrating Transformer and Mamba architectures with a dual attention mechanism, and TRCL, an unsupervised label refinement method, for robust driver distraction recognition.
Findings
Achieves state-of-the-art performance on multiple datasets.
Demonstrates real-time processing on NVIDIA Jetson AGX Orin.
Significantly improves accuracy and robustness in driver distraction detection.
Abstract
Driver distraction remains a leading cause of traffic accidents, posing a critical threat to road safety globally. As intelligent transportation systems evolve, accurate and real-time identification of driver distraction has become essential. However, existing methods struggle to capture both global contextual and fine-grained local features while contending with noisy labels in training datasets. To address these challenges, we propose DSDFormer, a novel framework that integrates the strengths of Transformer and Mamba architectures through a Dual State Domain Attention (DSDA) mechanism, enabling a balance between long-range dependencies and detailed feature extraction for robust driver behavior recognition. Additionally, we introduce Temporal Reasoning Confident Learning (TRCL), an unsupervised approach that refines noisy labels by leveraging spatiotemporal correlations in video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Human-Automation Interaction and Safety · Autonomous Vehicle Technology and Safety
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Label Smoothing · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer
