Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization
Natsuo Yamashita, Shota Horiguchi, Takeshi Homma

TL;DR
This paper presents a novel simulation method for natural conversational speech that incorporates turn-taking, improving the training data quality for end-to-end neural diarization and enhancing its performance.
Contribution
It introduces a turn-taking based simulation approach for creating more realistic training data for neural diarization models.
Findings
Simulated dataset closely matches real data in silence and overlap ratios.
Using the new simulation improves diarization performance on CALLHOME and CSJ datasets.
Turn-taking consideration enhances the naturalness of simulated conversations.
Abstract
This paper investigates a method for simulating natural conversation in the model training of end-to-end neural diarization (EEND). Due to the lack of any annotated real conversational dataset, EEND is usually pretrained on a large-scale simulated conversational dataset first and then adapted to the target real dataset. Simulated datasets play an essential role in the training of EEND, but as yet there has been insufficient investigation into an optimal simulation method. We thus propose a method to simulate natural conversational speech. In contrast to conventional methods, which simply combine the speech of multiple speakers, our method takes turn-taking into account. We define four types of speaker transition and sequentially arrange them to simulate natural conversations. The dataset simulated using our method was found to be statistically similar to the real dataset in terms of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsEnd-to-End Neural Diarization
