Correcting Mispronunciations in Speech using Spectrogram Inpainting

Talia Ben-Simon; Felix Kreuk; Faten Awwad; Jacob T. Cohen; Joseph; Keshet

arXiv:2204.03379·eess.AS·July 1, 2022

Correcting Mispronunciations in Speech using Spectrogram Inpainting

Talia Ben-Simon, Felix Kreuk, Faten Awwad, Jacob T. Cohen, Joseph, Keshet

PDF

Open Access

TL;DR

This paper introduces a deep learning inpainting system using spectrogram masking to correct mispronunciations in speech, maintaining speaker identity and providing synthetic feedback for language learners and children with pronunciation issues.

Contribution

It presents a novel spectrogram inpainting approach with a U-net architecture to generate corrected speech while preserving speaker voice, trained on proper speech examples.

Findings

01

Listeners slightly prefer the generated speech over simple phoneme replacement.

02

The system effectively reconstructs correct pronunciations in minimal pairs.

03

It shows promise for aiding language learning and speech therapy.

Abstract

Learning a new language involves constantly comparing speech productions with reference productions from the environment. Early in speech acquisition, children make articulatory adjustments to match their caregivers' speech. Grownup learners of a language tweak their speech to match the tutor reference. This paper proposes a method to synthetically generate correct pronunciation feedback given incorrect production. Furthermore, our aim is to generate the corrected production while maintaining the speaker's original voice. The system prompts the user to pronounce a phrase. The speech is recorded, and the samples associated with the inaccurate phoneme are masked with zeros. This waveform serves as an input to a speech generator, implemented as a deep learning inpainting system with a U-net architecture, and trained to output a reconstructed speech. The training set is composed of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Concatenated Skip Connection · Max Pooling · U-Net · Inpainting