# Deformable Pyramid Sparse Transformer for Semi-Supervised Driver Distraction Detection

**Authors:** Qiang Zhao, Zhichao Yu, Jiahui Yu, Simon James Fong, Yuchu Lin, Rui Wang, Weiwei Lin

PMC · DOI: 10.3390/s26030803 · Sensors (Basel, Switzerland) · 2026-01-25

## TL;DR

This paper introduces a semi-supervised framework for detecting driver distraction using limited labeled data and unlabeled samples, improving performance in real-world scenarios.

## Contribution

The novel contribution is an adaptive semi-supervised framework with a Deformable Pyramid Sparse Transformer and pseudo-label optimization for driver distraction detection.

## Key findings

- The proposed framework outperforms fully supervised baselines in mAP@0.5 and mAP@0.5:0.95 metrics.
- The DPST module enables precise multi-scale feature alignment and efficient semantic fusion.
- The method maintains a balanced trade-off between precision and recall in driver distraction detection.

## Abstract

Ensuring sustained driver attention is critical for intelligent transportation safety systems; however, the performance of data-driven driver distraction detection models is often limited by the high cost of large-scale manual annotation. To address this challenge, this paper proposes an adaptive semi-supervised driver distraction detection framework based on teacher–student learning and deformable pyramid feature fusion. The framework leverages a limited amount of labeled data together with abundant unlabeled samples to achieve robust and scalable distraction detection. An adaptive pseudo-label optimization strategy is introduced, incorporating category-aware pseudo-label thresholding, delayed pseudo-label scheduling, and a confidence-weighted pseudo-label loss to dynamically balance pseudo-label quality and training stability. To enhance fine-grained perception of subtle driver behaviors, a Deformable Pyramid Sparse Transformer (DPST) module is integrated into a lightweight YOLOv11 detector, enabling precise multi-scale feature alignment and efficient cross-scale semantic fusion. Furthermore, a teacher-guided feature consistency distillation mechanism is employed to promote semantic alignment between teacher and student models at the feature level, mitigating the adverse effects of noisy pseudo-labels. Extensive experiments conducted on the Roboflow Distracted Driving Dataset demonstrate that the proposed method outperforms representative fully supervised baselines in terms of mAP@0.5 and mAP@0.5:0.95 while maintaining a balanced trade-off between precision and recall. These results indicate that the proposed framework provides an effective and practical solution for real-world driver monitoring systems under limited annotation conditions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12899691/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12899691/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12899691/full.md

---
Source: https://tomesphere.com/paper/PMC12899691