Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor
Robin Ruede, Markus M\"uller, Sebastian St\"uker, Alex Waibel

TL;DR
This paper presents a deep learning approach using neural networks, including LSTM, to detect acoustic backchannel cues in human-computer interaction, improving detection accuracy over previous methods.
Contribution
It introduces neural network models, especially LSTM, for backchannel detection based on acoustic features, surpassing rule-based and simpler neural approaches.
Findings
LSTM outperforms feed-forward networks in backchannel detection.
Adding linguistic features improves F1-Score from 0.37 to 0.39.
Neural networks can effectively derive higher order features for social cue detection.
Abstract
Using supporting backchannel (BC) cues can make human-computer interaction more social. BCs provide a feedback from the listener to the speaker indicating to the speaker that he is still listened to. BCs can be expressed in different ways, depending on the modality of the interaction, for example as gestures or acoustic cues. In this work, we only considered acoustic cues. We are proposing an approach towards detecting BC opportunities based on acoustic input features like power and pitch. While other works in the field rely on the use of a hand-written rule set or specialized features, we made use of artificial neural networks. They are capable of deriving higher order features from input features themselves. In our setup, we first used a fully connected feed-forward network to establish an updated baseline in comparison to our previously proposed setup. We also extended this setup by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Communication and Language · Subtitles and Audiovisual Media · Video Analysis and Summarization
