Empowering Communication: Speech Technology for Indian and Western   Accents through AI-powered Speech Synthesis

Vinotha R; Hepsiba D; L. D. Vijay Anand; Deepak John Reji

arXiv:2401.11771·eess.AS·February 19, 2024·1 cites

Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis

Vinotha R, Hepsiba D, L. D. Vijay Anand, Deepak John Reji

PDF

Open Access

TL;DR

This paper presents an open-source AI-powered speech synthesis system with voice cloning capabilities, designed to generate natural-sounding speech for diverse speakers, including those with speech disorders, using neural network architecture.

Contribution

It introduces a comprehensive speech synthesis package combining speaker verification, voice cloning, and noise reduction, with evaluation on both seen and unseen speakers.

Findings

01

High-quality speech synthesis achieved with MOS scores

02

Effective voice cloning for diverse speakers

03

Robust noise reduction improves speech clarity

Abstract

Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech using neural networks. One of the most remarkable features of TTS synthesis is its capability to produce speech in the voice of different speakers. This paper introduces voice cloning and speech synthesis https://pypi.org/project/voice-cloning/ an open-source python package for helping speech disorders to communicate more effectively as well as for professionals seeking to integrate voice cloning or speech synthesis capabilities into their projects. This package aims to generate synthetic speech that sounds like the natural voice of an individual, but it does not replace the natural human voice. The architecture of the system comprises a speaker verification system, a synthesizer, a vocoder, and noise reduction. Speaker verification system trained on a varied set of speakers to achieve optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsSparse Evolutionary Training · Multi-Head Attention · Attention Is All You Need · Softmax · Linear Layer · Synthesizer