Prosody Based Co-analysis for Continuous Recognition of Coverbal Gestures
Sanshzar Kettebekov, Mohammed Yeasin, Rajeev Sharma

TL;DR
This paper introduces a Bayesian approach that combines prosodic speech features with visual gesture data to improve the recognition accuracy of continuous coverbal gestures, especially small gestures, in multimodal systems.
Contribution
It presents a novel Bayesian co-analysis method that leverages speech prosody and gesture articulation for enhanced gesture recognition accuracy.
Findings
Improved detection of small gestures.
Enhanced continuous gesture recognition rate.
Validated on a large broadcast database.
Abstract
Although speech and gesture recognition has been studied extensively, all the successful attempts of combining them in the unified framework were semantically motivated, e.g., keyword-gesture cooccurrence. Such formulations inherited the complexity of natural language processing. This paper presents a Bayesian formulation that uses a phenomenon of gesture and speech articulation for improving accuracy of automatic recognition of continuous coverbal gestures. The prosodic features from the speech signal were coanalyzed with the visual signal to learn the prior probability of co-occurrence of the prominent spoken segments with the particular kinematical phases of gestures. It was found that the above co-analysis helps in detecting and disambiguating visually small gestures, which subsequently improves the rate of continuous gesture recognition. The efficacy of the proposed approach was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Tactile and Sensory Interactions
