Bridging the Gap: Converting Read Text to Conversational Dialogue
Parshav Singla, Agnik Banerjee, Aaditya Arora, Shruti Aggarwal, Anil Kumar Verma, Vikram C M, Raj Prakash Gohil, Gopal Kumar Agarwal

TL;DR
This paper presents PACC, a novel deep learning approach utilizing HiFi-GAN to convert read speech into natural conversational speech, improving naturalness and accuracy for real-time speech applications.
Contribution
Introduction of PACC, a new prosodic adjustment method using advanced neural networks and HiFi-GAN for high-quality speech conversion.
Findings
Significant improvements in naturalness and model accuracy.
Achieved new benchmarks in speech conversion and MOS evaluation.
Demonstrated successful extension to other speech conversion tasks.
Abstract
In recent advancements within speech processing, converting read speech to conversational speech has gained significant attention. The primary challenge in this domain is maintaining naturalness and intelligibility while minimizing computational overhead for real-time applications. Traditional read speech often lacks the nuanced prosodic variation essential for natural conversational interactions, posing challenges for applications in virtual assistants, customer service, and language learning tools. This paper introduces a novel approach, Prosodic Adjustment with Conversational Context (PACC), aimed at converting read speech into natural conversational speech used in various modern applications. PACC utilizes advanced deep neural networks to analyze and modify prosodic features such as intonation, stress, and rhythm. Unlike conventional methods, our approach uses High-Fidelity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
