DSDFormer: An Innovative Transformer-Mamba Framework for Robust   High-Precision Driver Distraction Identification

Junzhou Chen; Zirui Zhang; Jing Yu; Heqiang Huang; Ronghui Zhang,; Xuemiao Xu; Bin Sheng; Hong Yan

arXiv:2409.05587·cs.CV·September 13, 2024

DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification

Junzhou Chen, Zirui Zhang, Jing Yu, Heqiang Huang, Ronghui Zhang,, Xuemiao Xu, Bin Sheng, Hong Yan

PDF

Open Access

TL;DR

DSDFormer combines Transformer and Mamba architectures with a novel attention mechanism and an unsupervised label refinement method to improve real-time, accurate driver distraction detection, addressing noisy labels and capturing both global and local features.

Contribution

The paper introduces DSDFormer, a new framework integrating Transformer and Mamba architectures with a dual attention mechanism, and TRCL, an unsupervised label refinement method, for robust driver distraction recognition.

Findings

01

Achieves state-of-the-art performance on multiple datasets.

02

Demonstrates real-time processing on NVIDIA Jetson AGX Orin.

03

Significantly improves accuracy and robustness in driver distraction detection.

Abstract

Driver distraction remains a leading cause of traffic accidents, posing a critical threat to road safety globally. As intelligent transportation systems evolve, accurate and real-time identification of driver distraction has become essential. However, existing methods struggle to capture both global contextual and fine-grained local features while contending with noisy labels in training datasets. To address these challenges, we propose DSDFormer, a novel framework that integrates the strengths of Transformer and Mamba architectures through a Dual State Domain Attention (DSDA) mechanism, enabling a balance between long-range dependencies and detailed feature extraction for robust driver behavior recognition. Additionally, we introduce Temporal Reasoning Confident Learning (TRCL), an unsupervised approach that refines noisy labels by leveraging spatiotemporal correlations in video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Human-Automation Interaction and Safety · Autonomous Vehicle Technology and Safety

MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Softmax · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Label Smoothing · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer