Development of a Conversation State Prediction System
Sujay Uday Rittikar

TL;DR
This paper presents a system that combines speaker diarization with Markov Chains to predict speaker states in long, natural conversations, achieving less than 12% error in state recognition.
Contribution
It introduces a novel approach integrating speaker diarization with Markov Chains to predict speaker states in multi-speaker conversations.
Findings
Achieved state recognition error rates ≤ 12%
Effective extension of speaker diarization for state prediction
Applicable to conversations with three or more speakers
Abstract
With the evolution of the concept of Speaker diarization using LSTM, it is relatively easier to understand the speaker identities for specific segments of input audio stream data than manually tagging the data. With such a concept, it is highly desirable to consider the possibility of using the identified speaker identities to aid in recognizing the Speaker States in a conversation. In this study, the Markov Chains are used to identify and update the Speaker States for the next conversations between the same set of speakers, to enable identification of their states in the most natural and long conversations. The model is based on several audio samples from natural conversations of three or greater than three speakers in two datasets with overall total error percentages for recognized states being lesser than or equal to 12%. The findings imply that the proposed extension to the Speaker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
