Towards Language Modelling in the Speech Domain Using Sub-word   Linguistic Units

Anurag Katakkar; Alan W Black

arXiv:2111.00610·cs.CL·November 2, 2021

Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units

Anurag Katakkar, Alan W Black

PDF

Open Access

TL;DR

This paper introduces a novel LSTM-based generative speech language model using linguistic units like syllables and phonemes, demonstrating promising results with limited data and exploring training challenges.

Contribution

The paper presents a new speech language model based on linguistic units, addressing acoustic consistency and training challenges in the speech domain.

Findings

01

Model closely approximates babbling speech with limited data

02

Training with auxiliary text LMs and articulatory features impacts performance

03

Validation metrics like MCD may not correlate with speech quality

Abstract

Language models (LMs) for text data have been studied extensively for their usefulness in language generation and other downstream tasks. However, language modelling purely in the speech domain is still a relatively unexplored topic, with traditional speech LMs often depending on auxiliary text LMs for learning distributional aspects of the language. For the English language, these LMs treat words as atomic units, which presents inherent challenges to language modelling in the speech domain. In this paper, we propose a novel LSTM-based generative speech LM that is inspired by the CBOW model and built on linguistic units including syllables and phonemes. This offers better acoustic consistency across utterances in the dataset, as opposed to single melspectrogram frames, or whole words. With a limited dataset, orders of magnitude smaller than that required by contemporary generative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques