A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation
Li Liu, Lufei Gao, Wentao Lei, Fengji Ma, Xiaotian Lin, Jinting Wang

TL;DR
This survey comprehensively reviews deep multi-modal learning techniques for body language recognition and generation, analyzing datasets, methods, challenges, and future directions in the field.
Contribution
It uniquely connects four body language types—Sign Language, Cued Speech, Co-speech, and Talking Head—and provides an organized overview of datasets and state-of-the-art methods.
Findings
Established connections among four body language types.
Reviewed benchmark datasets and evaluated SOTA methods.
Identified challenges like limited data and domain adaptation.
Abstract
Body language (BL) refers to the non-verbal communication expressed through physical movements, gestures, facial expressions, and postures. It is a form of communication that conveys information, emotions, attitudes, and intentions without the use of spoken or written words. It plays a crucial role in interpersonal interactions and can complement or even override verbal communication. Deep multi-modal learning techniques have shown promise in understanding and analyzing these diverse aspects of BL. The survey emphasizes their applications to BL generation and recognition. Several common BLs are considered i.e., Sign Language (SL), Cued Speech (CS), Co-speech (CoS), and Talking Head (TH), and we have conducted an analysis and established the connections among these four BL for the first time. Their generation and recognition often involve multi-modal approaches. Benchmark datasets for BL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Human Pose and Action Recognition
