Audience Response Prediction from Textual Context
Ibrahim Shoer, Berker Turker, Engin Erzin

TL;DR
This paper explores predicting audience responses during presentations using textual context and BERT, demonstrating that longer causal contexts can outperform non-causal ones in accuracy.
Contribution
It introduces a novel audience response prediction task from textual speech using BERT, comparing causal and non-causal contexts for improved prediction accuracy.
Findings
Non-causal context improves prediction accuracy significantly.
Longer causal contexts match or exceed non-causal performance.
Models achieve high UAR and F1 scores on OPUS and TED datasets.
Abstract
Humans' perception system closely monitors audio-visual cues during multiparty interactions to react timely and naturally. Learning to predict timing and type of reaction responses during human-human interactions may help us to enrich human-computer interaction applications. In this paper we consider a presenter-audience setting and define an audience response prediction task from the presenter's textual speech. The task is formulated as a binary classification problem as occurrence and absence of response after the presenter's textual speech. We use the BERT model as our classifier and investigate models with different textual contexts under causal and non-causal prediction settings. While the non-causal textual context, one sentence preceding and one sentence following the response event, can hugely improve the accuracy of predictions, we showed that longer textual contexts with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech and dialogue systems
