Multimodal Driver Referencing: A Comparison of Pointing to Objects   Inside and Outside the Vehicle

Abdul Rafey Aftab; Michael von der Beeck

arXiv:2202.07360·cs.HC·February 16, 2022

Multimodal Driver Referencing: A Comparison of Pointing to Objects Inside and Outside the Vehicle

Abdul Rafey Aftab, Michael von der Beeck

PDF

TL;DR

This paper presents a multimodal deep learning approach for driver-object referencing inside and outside vehicles, integrating eye-gaze, head, finger movements, and speech to improve natural human-machine interaction.

Contribution

It introduces a fusion architecture combining multiple modalities to accurately identify driver referencing intent and object location, addressing modality limitations.

Findings

01

Multimodal fusion improves referencing accuracy over single modalities.

02

Driver behavior differs between inside and outside referencing tasks.

03

The method effectively distinguishes object location based on pointing direction.

Abstract

Advanced in-cabin sensing technologies, especially vision based approaches, have tremendously progressed user interaction inside the vehicle, paving the way for new applications of natural user interaction. Just as humans use multiple modes to communicate with each other, we follow an approach which is characterized by simultaneously using multiple modalities to achieve natural human-machine interaction for a specific task: pointing to or glancing towards objects inside as well as outside the vehicle for deictic references. By tracking the movements of eye-gaze, head and finger, we design a multimodal fusion architecture using a deep neural network to precisely identify the driver's referencing intent. Additionally, we use a speech command as a trigger to separate each referencing event. We observe differences in driver behavior in the two pointing use cases (i.e. for inside and outside…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.