Response-conditioned Turn-taking Prediction
Bing'er Jiang, Erik Ekstedt, Gabriel Skantze

TL;DR
This paper introduces a model that predicts turn-taking in conversations by considering both conversation history and the speaker's intended response, improving accuracy in ambiguous scenarios.
Contribution
It extends TurnGPT to condition turn prediction on both dialogue context and anticipated response, enhancing performance in complex turn-taking situations.
Findings
Outperforms baseline models in multiple metrics
Most improvements occur when turn cues are ambiguous
Can serve as an incremental response ranker
Abstract
Previous approaches to turn-taking and response generation in conversational systems have treated it as a two-stage process: First, the end of a turn is detected (based on conversation history), then the system generates an appropriate response. Humans, however, do not take the turn just because it is likely, but also consider whether what they want to say fits the position. In this paper, we present a model (an extension of TurnGPT) that conditions the end-of-turn prediction on both conversation history and what the next speaker wants to say. We found that our model consistently outperforms the baseline model in a variety of metrics. The improvement is most prominent in two scenarios where turn predictions can be ambiguous solely from the conversation history: 1) when the current utterance contains a statement followed by a question; 2) when the end of the current utterance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Discourse, Communication Strategies · Speech and dialogue systems · Interpreting and Communication in Healthcare
