Large Language Models Know What To Say But Not When To Speak

Muhammad Umair; Vasanth Sarathy; JP de Ruiter

arXiv:2410.16044·cs.CL·October 22, 2024

Large Language Models Know What To Say But Not When To Speak

Muhammad Umair, Vasanth Sarathy, JP de Ruiter

PDF

Open Access 1 Video

TL;DR

This paper investigates the ability of Large Language Models to predict speaking opportunities within conversations, highlighting their current limitations and introducing a new dataset for evaluation.

Contribution

The paper introduces a novel dataset of within-turn Transition Relevance Places and evaluates LLMs' performance in predicting these, addressing a gap in turn-taking prediction.

Findings

01

LLMs struggle to accurately predict within-turn TRPs in unscripted conversations.

02

Current models focus mainly on turn-final TRPs, neglecting within-turn cues.

03

The study highlights areas for improving LLMs' turn-taking capabilities.

Abstract

Turn-taking is a fundamental mechanism in human communication that ensures smooth and coherent verbal interactions. Recent advances in Large Language Models (LLMs) have motivated their use in improving the turn-taking capabilities of Spoken Dialogue Systems (SDS), such as their ability to respond at appropriate times. However, existing models often struggle to predict opportunities for speaking -- called Transition Relevance Places (TRPs) -- in natural, unscripted conversations, focusing only on turn-final TRPs and not within-turn TRPs. To address these limitations, we introduce a novel dataset of participant-labeled within-turn TRPs and use it to evaluate the performance of state-of-the-art LLMs in predicting opportunities for speaking. Our experiments reveal the current limitations of LLMs in modeling unscripted spoken interactions, highlighting areas for improvement and paving the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Large Language Models Know What To Say But Not When To Speak· underline

Taxonomy

TopicsTopic Modeling