HuBERTopic: Enhancing Semantic Representation of HuBERT through   Self-supervision Utilizing Topic Model

Takashi Maekaku; Jiatong Shi; Xuankai Chang; Yuya Fujita; Shinji; Watanabe

arXiv:2310.03975·cs.SD·October 9, 2023

HuBERTopic: Enhancing Semantic Representation of HuBERT through Self-supervision Utilizing Topic Model

Takashi Maekaku, Jiatong Shi, Xuankai Chang, Yuya Fujita, Shinji, Watanabe

PDF

Open Access

TL;DR

This paper introduces HuBERTopic, a method that enhances HuBERT's semantic understanding by integrating topic modeling with self-supervised learning, improving performance across speech tasks.

Contribution

It proposes a novel approach that combines topic models with HuBERT to incorporate global semantic information in an unsupervised manner.

Findings

01

Achieves comparable or better performance than baseline in speech tasks.

02

Captures diverse semantic information like speaker and theme.

03

Improves understanding of global context in speech representations.

Abstract

Recently, the usefulness of self-supervised representation learning (SSRL) methods has been confirmed in various downstream tasks. Many of these models, as exemplified by HuBERT and WavLM, use pseudo-labels generated from spectral features or the model's own representation features. From previous studies, it is known that the pseudo-labels contain semantic information. However, the masked prediction task, the learning criterion of HuBERT, focuses on local contextual information and may not make effective use of global semantic information such as speaker, theme of speech, and so on. In this paper, we propose a new approach to enrich the semantic representation of HuBERT. We apply topic model to pseudo-labels to generate a topic label for each utterance. An auxiliary topic classification task is added to HuBERT by using topic labels as teachers. This allows additional global semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems