Investigating Speech Features for Continuous Turn-Taking Prediction   Using LSTMs

Matthew Roddy; Gabriel Skantze; Naomi Harte

arXiv:1806.11461·cs.CL·July 2, 2018

Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs

Matthew Roddy, Gabriel Skantze, Naomi Harte

PDF

1 Repo

TL;DR

This paper explores speech features for continuous turn-taking prediction in dialog systems using LSTMs, aiming to improve fluidity and overlap handling beyond traditional end-of-turn models.

Contribution

It identifies effective speech-related features for turn prediction and demonstrates that LSTM-based models outperform previous baselines in this task.

Findings

01

Traditional acoustic features perform well for turn prediction.

02

Word features outperform part-of-speech features.

03

LSTM models outperform previous baselines.

Abstract

For spoken dialog systems to conduct fluid conversational interactions with users, the systems must be sensitive to turn-taking cues produced by a user. Models should be designed so that effective decisions can be made as to when it is appropriate, or not, for the system to speak. Traditional end-of-turn models, where decisions are made at utterance end-points, are limited in their ability to model fast turn-switches and overlap. A more flexible approach is to model turn-taking in a continuous manner using RNNs, where the system predicts speech probability scores for discrete frames within a future window. The continuous predictions represent generalized turn-taking behaviors observed in the training data and can be applied to make decisions that are not just limited to end-of-turn detection. In this paper, we investigate optimal speech-related feature sets for making predictions at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mattroddy/lstm_turn_taking_prediction
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.