Towards an Unsupervised Entrainment Distance in Conversational Speech using Deep Neural Networks
Md Nasir, Brian Baucom, Shrikanth Narayanan, Panayiotis Georgiou

TL;DR
This paper introduces an unsupervised neural network-based measure called Neural Entrainment Distance (NED) to quantify acoustic entrainment in conversational speech, validated through experiments and real-world data.
Contribution
It proposes a novel unsupervised DNN framework to measure entrainment, capturing speaker adaptation in speech without labeled data.
Findings
NED effectively distinguishes real from shuffled conversations.
High NED correlates with higher emotional bond ratings.
The measure aligns with prior findings on speech entrainment and emotional connection.
Abstract
Entrainment is a known adaptation mechanism that causes interaction participants to adapt or synchronize their acoustic characteristics. Understanding how interlocutors tend to adapt to each other's speaking style through entrainment involves measuring a range of acoustic features and comparing those via multiple signal comparison methods. In this work, we present a turn-level distance measure obtained in an unsupervised manner using a Deep Neural Network (DNN) model, which we call Neural Entrainment Distance (NED). This metric establishes a framework that learns an embedding from the population-wide entrainment in an unlabeled training corpus. We use the framework for a set of acoustic features and validate the measure experimentally by showing its efficacy in distinguishing real conversations from fake ones created by randomly shuffling speaker turns. Moreover, we show real world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
