Large Language Models Know What To Say But Not When To Speak
Muhammad Umair, Vasanth Sarathy, JP de Ruiter

TL;DR
This paper investigates the ability of Large Language Models to predict speaking opportunities within conversations, highlighting their current limitations and introducing a new dataset for evaluation.
Contribution
The paper introduces a novel dataset of within-turn Transition Relevance Places and evaluates LLMs' performance in predicting these, addressing a gap in turn-taking prediction.
Findings
LLMs struggle to accurately predict within-turn TRPs in unscripted conversations.
Current models focus mainly on turn-final TRPs, neglecting within-turn cues.
The study highlights areas for improving LLMs' turn-taking capabilities.
Abstract
Turn-taking is a fundamental mechanism in human communication that ensures smooth and coherent verbal interactions. Recent advances in Large Language Models (LLMs) have motivated their use in improving the turn-taking capabilities of Spoken Dialogue Systems (SDS), such as their ability to respond at appropriate times. However, existing models often struggle to predict opportunities for speaking -- called Transition Relevance Places (TRPs) -- in natural, unscripted conversations, focusing only on turn-final TRPs and not within-turn TRPs. To address these limitations, we introduce a novel dataset of participant-labeled within-turn TRPs and use it to evaluate the performance of state-of-the-art LLMs in predicting opportunities for speaking. Our experiments reveal the current limitations of LLMs in modeling unscripted spoken interactions, highlighting areas for improvement and paving the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling
