Development of a Conversation State Prediction System

Sujay Uday Rittikar

arXiv:2107.01462·cs.SD·December 14, 2021

Development of a Conversation State Prediction System

Sujay Uday Rittikar

PDF

Open Access

TL;DR

This paper presents a system that combines speaker diarization with Markov Chains to predict speaker states in long, natural conversations, achieving less than 12% error in state recognition.

Contribution

It introduces a novel approach integrating speaker diarization with Markov Chains to predict speaker states in multi-speaker conversations.

Findings

01

Achieved state recognition error rates ≤ 12%

02

Effective extension of speaker diarization for state prediction

03

Applicable to conversations with three or more speakers

Abstract

With the evolution of the concept of Speaker diarization using LSTM, it is relatively easier to understand the speaker identities for specific segments of input audio stream data than manually tagging the data. With such a concept, it is highly desirable to consider the possibility of using the identified speaker identities to aid in recognizing the Speaker States in a conversation. In this study, the Markov Chains are used to identify and update the Speaker States for the next conversations between the same set of speakers, to enable identification of their states in the most natural and long conversations. The model is based on several audio samples from natural conversations of three or greater than three speakers in two datasets with overall total error percentages for recognized states being lesser than or equal to 12%. The findings imply that the proposed extension to the Speaker…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory