Vision-Based Multimodal Interfaces: A Survey and Taxonomy for Enhanced Context-Aware System Design
Yongquan 'Owen' Hu, Jingyu Tang, Xinya Gong, Zhongyi Zhou, Shuning, Zhang, Don Samitha Elvitigala, Florian 'Floyd' Mueller, Wen Hu, Aaron J., Quigley

TL;DR
This survey reviews vision-based multimodal interfaces, emphasizing the visual modality's role in enhancing context-aware human-computer interaction and proposing a classification framework for system design.
Contribution
It provides a comprehensive taxonomy and analysis of vision-based multimodal interfaces, focusing on the visual modality's importance in context-aware system development.
Findings
Highlights the critical role of visual modality in multimodal interaction
Classifies VMIs across multiple dimensions for better system design
Provides insights for developing more effective context-aware systems
Abstract
The recent surge in artificial intelligence, particularly in multimodal processing technology, has advanced human-computer interaction, by altering how intelligent systems perceive, understand, and respond to contextual information (i.e., context awareness). Despite such advancements, there is a significant gap in comprehensive reviews examining these advances, especially from a multimodal data perspective, which is crucial for refining system design. This paper addresses a key aspect of this gap by conducting a systematic survey of data modality-driven Vision-based Multimodal Interfaces (VMIs). VMIs are essential for integrating multimodal data, enabling more precise interpretation of user intentions and complex interactions across physical and digital environments. Unlike previous task- or scenario-driven surveys, this study highlights the critical role of the visual modality in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
