Improving End-of-turn Detection in Spoken Dialogues by Detecting Speaker Intentions as a Secondary Task
Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost

TL;DR
This paper introduces a multi-task neural approach that predicts speaker intentions alongside turn-transitions in spoken dialogues, improving turn-taking prediction without extra runtime features.
Contribution
The novel contribution is the joint modeling of speaker intentions and turn-transitions to enhance turn-taking prediction in spoken dialogues.
Findings
Speaker intention prediction improves turn-transition accuracy.
The method does not require additional runtime features.
Joint modeling outperforms single-task approaches.
Abstract
This work focuses on the use of acoustic cues for modeling turn-taking in dyadic spoken dialogues. Previous work has shown that speaker intentions (e.g., asking a question, uttering a backchannel, etc.) can influence turn-taking behavior and are good predictors of turn-transitions in spoken dialogues. However, speaker intentions are not readily available for use by automated systems at run-time; making it difficult to use this information to anticipate a turn-transition. To this end, we propose a multi-task neural approach for predicting turn- transitions and speaker intentions simultaneously. Our results show that adding the auxiliary task of speaker intention prediction improves the performance of turn-transition prediction in spoken dialogues, without relying on additional input features during run-time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
