Multimodal Driver Referencing: A Comparison of Pointing to Objects Inside and Outside the Vehicle
Abdul Rafey Aftab, Michael von der Beeck

TL;DR
This paper presents a multimodal deep learning approach for driver-object referencing inside and outside vehicles, integrating eye-gaze, head, finger movements, and speech to improve natural human-machine interaction.
Contribution
It introduces a fusion architecture combining multiple modalities to accurately identify driver referencing intent and object location, addressing modality limitations.
Findings
Multimodal fusion improves referencing accuracy over single modalities.
Driver behavior differs between inside and outside referencing tasks.
The method effectively distinguishes object location based on pointing direction.
Abstract
Advanced in-cabin sensing technologies, especially vision based approaches, have tremendously progressed user interaction inside the vehicle, paving the way for new applications of natural user interaction. Just as humans use multiple modes to communicate with each other, we follow an approach which is characterized by simultaneously using multiple modalities to achieve natural human-machine interaction for a specific task: pointing to or glancing towards objects inside as well as outside the vehicle for deictic references. By tracking the movements of eye-gaze, head and finger, we design a multimodal fusion architecture using a deep neural network to precisely identify the driver's referencing intent. Additionally, we use a speech command as a trigger to separate each referencing event. We observe differences in driver behavior in the two pointing use cases (i.e. for inside and outside…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
