Analysing Discrete Self Supervised Speech Representation for Spoken   Language Modeling

Amitay Sicherman; Yossi Adi

arXiv:2301.00591·cs.CL·June 9, 2023·1 cites

Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

Amitay Sicherman, Yossi Adi

PDF

Open Access 1 Repo

TL;DR

This paper analyzes discrete self-supervised speech representations to understand their properties and proposes methods to improve their robustness and usefulness for spoken language modeling.

Contribution

It provides a detailed analysis of speech units, introduces a new redundancy metric, and develops methods to enhance unit clustering for better speech modeling.

Findings

01

High correlation between speech units and phonemes

02

Redundancies exist in extracted units due to context

03

Proposed methods improve zero-resource speech metrics

Abstract

This work profoundly analyzes discrete self-supervised speech representations (units) through the eyes of Generative Spoken Language Modeling (GSLM). Following the findings of such an analysis, we propose practical improvements to the discrete unit for the GSLM. First, we start comprehending these units by analyzing them in three axes: interpretation, visualization, and resynthesis. Our analysis finds a high correlation between the speech units to phonemes and phoneme families, while their correlation with speaker or gender is weaker. Additionally, we found redundancies in the extracted units and claim that one reason may be the units' context. Following this analysis, we propose a new, unsupervised metric to measure unit redundancies. Finally, we use this metric to develop new methods that improve the robustness of units' clustering and show significant improvement considering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

slp-rl/slm-discrete-representations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems