CiwaGAN: Articulatory information exchange
Ga\v{s}per Begu\v{s}, Thomas Lu, Alan Zhou, Peter Wu, Gopala K., Anumanchipalli

TL;DR
CiwaGAN is a novel deep learning model that combines unsupervised articulatory modeling with auditory information exchange to simulate human spoken language acquisition more realistically.
Contribution
It is the first model to integrate both articulatory and auditory components in an unsupervised framework for speech learning.
Findings
Most realistic approximation of human speech acquisition with deep learning
Improved articulatory model with interpretable internal representations
Useful for cognitively plausible speech simulations
Abstract
Humans encode information into sounds by controlling articulators and decode information from sounds using the auditory apparatus. This paper introduces CiwaGAN, a model of human spoken language acquisition that combines unsupervised articulatory modeling with an unsupervised model of information exchange through the auditory modality. While prior research includes unsupervised articulatory modeling and information exchange separately, our model is the first to combine the two components. The paper also proposes an improved articulatory model with more interpretable internal representations. The proposed CiwaGAN model is the most realistic approximation of human spoken language acquisition using deep learning. As such, it is useful for cognitively plausible simulations of the human speech act.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · Speech and Audio Processing
