Modeling speech recognition and synthesis simultaneously: Encoding and   decoding lexical and sublexical semantic information into speech with no   direct access to speech data

Ga\v{s}per Begu\v{s}; Alan Zhou

arXiv:2203.11476·cs.CL·September 20, 2022

Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data

Ga\v{s}per Begu\v{s}, Alan Zhou

PDF

TL;DR

This paper demonstrates that deep convolutional networks can automatically learn to encode and decode lexical semantic information directly from raw speech data without access to labeled training data, bridging production and perception principles.

Contribution

It introduces a novel unsupervised learning approach that captures lexical and sublexical semantic representations from raw speech, without direct training data access, combining production and perception models.

Findings

01

Networks classify lexical items in unobserved data.

02

Latent codes relate to meaningful sublexical units.

03

Models learn to decode information from raw acoustics.

Abstract

Human speakers encode information into raw speech which is then decoded by the listeners. This complex relationship between encoding (production) and decoding (perception) is often modeled separately. Here, we test how encoding and decoding of lexical semantic information can emerge automatically from raw speech in unsupervised generative deep convolutional networks that combine the production and perception principles of speech. We introduce, to our knowledge, the most challenging objective in unsupervised lexical learning: a network that must learn unique representations for lexical items with no direct access to training data. We train several models (ciwGAN and fiwGAN arXiv:2006.02951) and test how the networks classify acoustic lexical items in unobserved test data. Strong evidence in favor of lexical learning and a causal relationship between latent codes and meaningful sublexical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.