Naturalization of Text by the Insertion of Pauses and Filler Words
Richa Sharma, Parth Vipul Shah, Ashwini M. Joshi

TL;DR
This paper presents two methods for naturalizing text by inserting pauses and filler words, enhancing the naturalness of voice-based interactions with electronic systems.
Contribution
It introduces a bigram-based method and a neural network approach for inserting natural speech elements into text, with controllable naturalization levels.
Findings
Methods produce text comparable to natural speech in surveys
Both methods are fast and suitable for real-time applications
Naturalization quality is controllable and effective
Abstract
In this article, we introduce a set of methods to naturalize text based on natural human speech. Voice-based interactions provide a natural way of interfacing with electronic systems and are seeing a widespread adaptation of late. These computerized voices can be naturalized to some degree by inserting pauses and filler words at appropriate positions. The first proposed text transformation method uses the frequency of bigrams in the training data to make appropriate insertions in the input sentence. It uses a probability distribution to choose the insertions from a set of all possible insertions. This method is fast and can be included before a Text-To-Speech module. The second method uses a Recurrent Neural Network to predict the next word to be inserted. It confirms the insertions given by the bigram method. Additionally, the degree of naturalization can be controlled in both these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
