Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis
Vinotha R, Hepsiba D, L. D. Vijay Anand, Deepak John Reji

TL;DR
This paper presents an open-source AI-powered speech synthesis system with voice cloning capabilities, designed to generate natural-sounding speech for diverse speakers, including those with speech disorders, using neural network architecture.
Contribution
It introduces a comprehensive speech synthesis package combining speaker verification, voice cloning, and noise reduction, with evaluation on both seen and unseen speakers.
Findings
High-quality speech synthesis achieved with MOS scores
Effective voice cloning for diverse speakers
Robust noise reduction improves speech clarity
Abstract
Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech using neural networks. One of the most remarkable features of TTS synthesis is its capability to produce speech in the voice of different speakers. This paper introduces voice cloning and speech synthesis https://pypi.org/project/voice-cloning/ an open-source python package for helping speech disorders to communicate more effectively as well as for professionals seeking to integrate voice cloning or speech synthesis capabilities into their projects. This package aims to generate synthetic speech that sounds like the natural voice of an individual, but it does not replace the natural human voice. The architecture of the system comprises a speaker verification system, a synthesizer, a vocoder, and noise reduction. Speaker verification system trained on a varied set of speakers to achieve optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsSparse Evolutionary Training · Multi-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Synthesizer
