Towards Enhanced Context Awareness with Vision-based Multimodal Interfaces
Yongquan Hu, Wen Hu, Aaron Quigley

TL;DR
This paper explores how vision-based multimodal interfaces can improve context awareness in human-computer interaction by integrating AI-driven visual modalities like scale, depth, and time for more seamless interactions.
Contribution
It presents three application cases demonstrating enhanced context awareness through multimodal visual data in HCI, focusing on scale, depth, and temporal information.
Findings
Enhanced interpretation of user intentions using multimodal visual data
Improved environmental understanding through depth and microscopic imaging
Seamless virtual interactions with integrated haptic feedback
Abstract
Vision-based Interfaces (VIs) are pivotal in advancing Human-Computer Interaction (HCI), particularly in enhancing context awareness. However, there are significant opportunities for these interfaces due to rapid advancements in multimodal Artificial Intelligence (AI), which promise a future of tight coupling between humans and intelligent systems. AI-driven VIs, when integrated with other modalities, offer a robust solution for effectively capturing and interpreting user intentions and complex environmental information, thereby facilitating seamless and efficient interactions. This PhD study explores three application cases of multimodal interfaces to augment context awareness, respectively focusing on three dimensions of visual modality: scale, depth, and time: a fine-grained analysis of physical surfaces via microscopic image, precise projection of the real world using depth data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
