Generative Adversarial Phonology: Modeling unsupervised phonetic and phonological learning with neural networks
Ga\v{s}per Begu\v{s}

TL;DR
This paper demonstrates that generative adversarial networks can unsupervisedly learn phonetic and phonological properties from raw speech data, revealing internal representations akin to linguistic features.
Contribution
It introduces a methodology to uncover and manipulate latent variables in GANs that correspond to phonetic and phonological features in speech data, without supervision.
Findings
GANs learn allophonic distributions in speech
Latent variables correspond to phonetic features like [s] presence
Manipulating latent variables controls speech features
Abstract
Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations. This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture and proposes a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties. The Generative Adversarial architecture is uniquely appropriate for modeling phonetic and phonological learning because the network is trained on unannotated raw acoustic data and learning is unsupervised without any language-specific assumptions or pre-assumed levels of abstraction. A Generative Adversarial Network was trained on an allophonic distribution in English. The network successfully learns the allophonic alternation: the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
