Cross-View Cross-Modal Unsupervised Domain Adaptation for Driver Monitoring System
Aditi Bhalla, Christian Hellert, Enkelejda Kasneci

TL;DR
This paper introduces a two-phase unsupervised domain adaptation framework that jointly addresses cross-view and cross-modal challenges in driver monitoring, significantly improving activity recognition accuracy across diverse vehicle setups.
Contribution
The work presents a novel joint framework combining contrastive learning and information bottleneck for cross-view and cross-modal unsupervised domain adaptation in driver monitoring.
Findings
Improves top-1 accuracy on RGB video data by nearly 50%.
Outperforms existing unsupervised domain adaptation methods by up to 5%.
Effectively handles real-time driver activity recognition across diverse vehicle configurations.
Abstract
Driver distraction remains a leading cause of road traffic accidents, contributing to thousands of fatalities annually across the globe. While deep learning-based driver activity recognition methods have shown promise in detecting such distractions, their effectiveness in real-world deployments is hindered by two critical challenges: variations in camera viewpoints (cross-view) and domain shifts such as change in sensor modality or environment. Existing methods typically address either cross-view generalization or unsupervised domain adaptation in isolation, leaving a gap in the robust and scalable deployment of models across diverse vehicle configurations. In this work, we propose a novel two-phase cross-view, cross-modal unsupervised domain adaptation framework that addresses these challenges jointly on real-time driver monitoring data. In the first phase, we learn view-invariant and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Sleep and Work-Related Fatigue · Emotion and Mood Recognition
