Personalized Filled-pause Generation with Group-wise Prediction Models
Yuta Matsunaga, Takaaki Saeki, Shinnosuke Takamichi, and Hiroshi, Saruwatari

TL;DR
This paper introduces a group-wise prediction approach for generating personalized filled pauses in disfluent speech, improving the naturalness and accuracy of disfluency modeling in text generation.
Contribution
It proposes a novel group-dependent prediction model and a specialized loss function and embedding for better personalized filled pause prediction.
Findings
Group-dependent models outperform non-personalized models in FP prediction.
The new loss function enhances prediction accuracy.
Word embeddings tailored for FP prediction improve model performance.
Abstract
In this paper, we propose a method to generate personalized filled pauses (FPs) with group-wise prediction models. Compared with fluent text generation, disfluent text generation has not been widely explored. To generate more human-like texts, we addressed disfluent text generation. The usage of disfluency, such as FPs, rephrases, and word fragments, differs from speaker to speaker, and thus, the generation of personalized FPs is required. However, it is difficult to predict them because of the sparsity of position and the frequency difference between more and less frequently used FPs. Moreover, it is sometimes difficult to adapt FP prediction models to each speaker because of the large variation of the tendency within each speaker. To address these issues, we propose a method to build group-dependent prediction models by grouping speakers on the basis of their tendency to use FPs. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
