A Contextual Analysis of Driver-Facing and Dual-View Video Inputs for Distraction Detection in Naturalistic Driving Environments

Anthony Dontoh; Stephanie Ivey; Armstrong Aboah

arXiv:2512.20025·cs.CV·April 20, 2026

A Contextual Analysis of Driver-Facing and Dual-View Video Inputs for Distraction Detection in Naturalistic Driving Environments

Anthony Dontoh, Stephanie Ivey, Armstrong Aboah

PDF

TL;DR

This study evaluates how incorporating road-facing views with driver-facing footage affects distraction detection accuracy in naturalistic driving, revealing that architecture design critically influences the benefits of contextual inputs.

Contribution

It provides a systematic comparison of single- and dual-view distraction detection models using real-world data, highlighting architecture-dependent performance impacts.

Findings

01

SlowOnly improved by 9.8% with dual-view inputs.

02

SlowFast experienced a 7.2% accuracy drop with dual-view inputs.

03

Architecture design determines whether contextual inputs enhance or hinder detection.

Abstract

Despite increasing interest in computer vision-based distracted driving detection, most existing models rely exclusively on driver-facing views and overlook crucial environmental context that influences driving behavior. This study investigates whether incorporating road-facing views alongside driver-facing footage improves distraction detection accuracy in naturalistic driving conditions. Using synchronized dual-camera recordings from real-world driving, we benchmark three leading spatiotemporal action recognition architectures: SlowFast-R50, X3D-M, and SlowOnly-R50. Each model is evaluated under two input configurations: driver-only and stacked dual-view. Results show that while contextual inputs can improve detection in certain models, performance gains depend strongly on the underlying architecture. The single-pathway SlowOnly model achieved a 9.8 percent improvement with dual-view…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.