Toward Natural Gesture/Speech Control of a Large Display
S. Kettebekov, R. Sharma

TL;DR
This paper presents a structured approach to integrating free hand gestures and speech for natural multimodal human-computer interaction with large displays, emphasizing semantic classification and temporal alignment.
Contribution
It introduces a semantic classification of co-verbal gestures and a computational framework for gesture-speech integration in 2D-display control systems.
Findings
Gesture and speech temporal alignment is crucial for semantic mapping.
Co-occurrence analysis reveals syntactic organization of gestures.
User studies confirm the importance of temporal alignment in multimodal HCI.
Abstract
In recent years because of the advances in computer vision research, free hand gestures have been explored as means of human-computer interaction (HCI). Together with improved speech processing technology it is an important step toward natural multimodal HCI. However, inclusion of non-predefined continuous gestures into a multimodal framework is a challenging problem. In this paper, we propose a structured approach for studying patterns of multimodal language in the context of a 2D-display control. We consider systematic analysis of gestures from observable kinematical primitives to their semantics as pertinent to a linguistic structure. Proposed semantic classification of co-verbal gestures distinguishes six categories based on their spatio-temporal deixis. We discuss evolution of a computational framework for gesture and speech integration which was used to develop an interactive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Tactile and Sensory Interactions · Speech and dialogue systems
