CiwaGAN: Articulatory information exchange

Ga\v{s}per Begu\v{s}; Thomas Lu; Alan Zhou; Peter Wu; Gopala K.; Anumanchipalli

arXiv:2309.07861·cs.SD·September 15, 2023

CiwaGAN: Articulatory information exchange

Ga\v{s}per Begu\v{s}, Thomas Lu, Alan Zhou, Peter Wu, Gopala K., Anumanchipalli

PDF

Open Access 1 Repo

TL;DR

CiwaGAN is a novel deep learning model that combines unsupervised articulatory modeling with auditory information exchange to simulate human spoken language acquisition more realistically.

Contribution

It is the first model to integrate both articulatory and auditory components in an unsupervised framework for speech learning.

Findings

01

Most realistic approximation of human speech acquisition with deep learning

02

Improved articulatory model with interpretable internal representations

03

Useful for cognitively plausible speech simulations

Abstract

Humans encode information into sounds by controlling articulators and decode information from sounds using the auditory apparatus. This paper introduces CiwaGAN, a model of human spoken language acquisition that combines unsupervised articulatory modeling with an unsupervised model of information exchange through the auditory modality. While prior research includes unsupervised articulatory modeling and information exchange separately, our model is the first to combine the two components. The paper also proposes an improved articulatory model with more interpretable internal representations. The proposed CiwaGAN model is the most realistic approximation of human spoken language acquisition using deep learning. As such, it is useful for cognitively plausible simulations of the human speech act.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gbegus/articulationgan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · Speech and Audio Processing