Addressing the Blind Spots in Spoken Language Processing

Amit Moryossef

arXiv:2309.06572·eess.AS·September 14, 2023·1 cites

Addressing the Blind Spots in Spoken Language Processing

Amit Moryossef

PDF

Open Access

TL;DR

This paper emphasizes the importance of non-verbal cues like gestures and facial expressions in human communication and proposes universal models for their automatic transcription to improve NLP understanding.

Contribution

It introduces a novel approach for integrating non-verbal cues into NLP by developing universal gesture segmentation and transcription models.

Findings

01

Highlights limitations of text-only models in capturing communication nuances

02

Proposes a flexible, efficient method for non-verbal cue transcription

03

Calls for community effort to develop and validate universal multimodal transcription methods

Abstract

This paper explores the critical but often overlooked role of non-verbal cues, including co-speech gestures and facial expressions, in human communication and their implications for Natural Language Processing (NLP). We argue that understanding human communication requires a more holistic approach that goes beyond textual or spoken words to include non-verbal elements. Borrowing from advances in sign language processing, we propose the development of universal automatic gesture segmentation and transcription models to transcribe these non-verbal cues into textual form. Such a methodology aims to bridge the blind spots in spoken language understanding, enhancing the scope and applicability of NLP models. Through motivating examples, we demonstrate the limitations of relying solely on text-based models. We propose a computationally efficient and flexible approach for incorporating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Speech and dialogue systems