Vision-Based Multimodal Interfaces: A Survey and Taxonomy for Enhanced   Context-Aware System Design

Yongquan 'Owen' Hu; Jingyu Tang; Xinya Gong; Zhongyi Zhou; Shuning; Zhang; Don Samitha Elvitigala; Florian 'Floyd' Mueller; Wen Hu; Aaron J.; Quigley

arXiv:2501.13443·cs.HC·March 18, 2025

Vision-Based Multimodal Interfaces: A Survey and Taxonomy for Enhanced Context-Aware System Design

Yongquan 'Owen' Hu, Jingyu Tang, Xinya Gong, Zhongyi Zhou, Shuning, Zhang, Don Samitha Elvitigala, Florian 'Floyd' Mueller, Wen Hu, Aaron J., Quigley

PDF

TL;DR

This survey reviews vision-based multimodal interfaces, emphasizing the visual modality's role in enhancing context-aware human-computer interaction and proposing a classification framework for system design.

Contribution

It provides a comprehensive taxonomy and analysis of vision-based multimodal interfaces, focusing on the visual modality's importance in context-aware system development.

Findings

01

Highlights the critical role of visual modality in multimodal interaction

02

Classifies VMIs across multiple dimensions for better system design

03

Provides insights for developing more effective context-aware systems

Abstract

The recent surge in artificial intelligence, particularly in multimodal processing technology, has advanced human-computer interaction, by altering how intelligent systems perceive, understand, and respond to contextual information (i.e., context awareness). Despite such advancements, there is a significant gap in comprehensive reviews examining these advances, especially from a multimodal data perspective, which is crucial for refining system design. This paper addresses a key aspect of this gap by conducting a systematic survey of data modality-driven Vision-based Multimodal Interfaces (VMIs). VMIs are essential for integrating multimodal data, enabling more precise interpretation of user intentions and complex interactions across physical and digital environments. Unlike previous task- or scenario-driven surveys, this study highlights the critical role of the visual modality in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.