Multimodal Polynomial Fusion for Detecting Driver Distraction
Yulun Du, Chirag Raman, Alan W Black, Louis-Philippe Morency, Maxine, Eskenazi

TL;DR
This paper presents a new multimodal dataset and a polynomial fusion technique for improved automatic detection of distracted driving using facial, speech, and car signal data.
Contribution
It introduces a novel multimodal dataset and demonstrates that polynomial fusion of features enhances distraction detection accuracy over baseline models.
Findings
Adding more modalities improves predictive accuracy.
Polynomial fusion outperforms baseline SVM and neural network models.
Multimodal approach significantly enhances detection performance.
Abstract
Distracted driving is deadly, claiming 3,477 lives in the U.S. in 2015 alone. Although there has been a considerable amount of research on modeling the distracted behavior of drivers under various conditions, accurate automatic detection using multiple modalities and especially the contribution of using the speech modality to improve accuracy has received little attention. This paper introduces a new multimodal dataset for distracted driving behavior and discusses automatic distraction detection using features from three modalities: facial expression, speech and car signals. Detailed multimodal feature analysis shows that adding more modalities monotonically increases the predictive accuracy of the model. Finally, a simple and effective multimodal fusion technique using a polynomial fusion layer shows superior distraction detection results compared to the baseline SVM and neural network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSupport Vector Machine
