Articulation GAN: Unsupervised modeling of articulatory learning

Ga\v{s}per Begu\v{s}; Alan Zhou; Peter Wu; Gopala K Anumanchipalli

arXiv:2210.15173·cs.SD·May 10, 2023·1 cites

Articulation GAN: Unsupervised modeling of articulatory learning

Ga\v{s}per Begu\v{s}, Alan Zhou, Peter Wu, Gopala K Anumanchipalli

PDF

Open Access 1 Repo

TL;DR

This paper introduces an unsupervised generative model that learns to produce articulatory representations of speech, closely mimicking human speech production, and then converts these to waveforms for speech synthesis.

Contribution

It presents the Articulatory Generator, a novel unsupervised model that learns to generate articulatory features and transforms them into speech, bridging physical speech production and neural network modeling.

Findings

01

The model learns to control articulators similarly to humans.

02

Generated speech includes both seen and unseen words.

03

Articulatory representations have implications for cognitive speech models.

Abstract

Generative deep neural networks are widely used for speech synthesis, but most existing models directly generate waveforms or spectral outputs. Humans, however, produce speech by controlling articulators, which results in the production of speech sounds through physical properties of sound propagation. We introduce the Articulatory Generator to the Generative Adversarial Network paradigm, a new unsupervised generative model of speech production/synthesis. The Articulatory Generator more closely mimics human speech production by learning to generate articulatory representations (electromagnetic articulography or EMA) in a fully unsupervised manner. A separate pre-trained physical model (ema2wav) then transforms the generated EMA representations to speech waveforms, which get sent to the Discriminator for evaluation. Articulatory analysis suggests that the network learns to control…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gbegus/articulationgan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research