Improving Spoken Language Modeling with Phoneme Classification: A Simple   Fine-tuning Approach

Maxime Poli; Emmanuel Chemla; Emmanuel Dupoux

arXiv:2410.00025·cs.CL·October 31, 2024

Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach

Maxime Poli, Emmanuel Chemla, Emmanuel Dupoux

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper demonstrates that fine-tuning speech models on phoneme classification enhances their language understanding capabilities, enabling more natural speech modeling with less data compared to traditional speech-only systems.

Contribution

The study introduces a simple fine-tuning approach on phoneme classification that improves speech representation models for more natural language understanding.

Findings

01

Phoneme fine-tuning yields more context-invariant speech representations.

02

Language models trained on phoneme units achieve comparable lexical comprehension with significantly less data.

03

Fine-tuned models outperform baseline speech models in understanding tasks.

Abstract

Recent progress in Spoken Language Modeling has shown that learning language directly from speech is feasible. Generating speech through a pipeline that operates at the text level typically loses nuances, intonations, and non-verbal vocalizations. Modeling directly from speech opens up the path to more natural and expressive systems. On the other hand, speech-only systems require up to three orders of magnitude more data to catch up to their text-based counterparts in terms of their semantic abilities. We show that fine-tuning speech representation models on phoneme classification leads to more context-invariant representations, and language models trained on these units achieve comparable lexical comprehension to ones trained on hundred times more data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bootphon/spokenlm-phoneme
pytorchOfficial

Models

🤗
coml/hubert-phoneme-classification
model· 3 dl· ♡ 3
3 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis