Discrete Speech Unit Extraction via Independent Component Analysis

Tomohiko Nakamura; Kwanghee Choi; Keigo Hojo; Yoshiaki Bando; Satoru; Fukayama; Shinji Watanabe

arXiv:2501.06562·eess.AS·January 14, 2025

Discrete Speech Unit Extraction via Independent Component Analysis

Tomohiko Nakamura, Kwanghee Choi, Keigo Hojo, Yoshiaki Bando, Satoru, Fukayama, Shinji Watanabe

PDF

1 Repo

TL;DR

This paper explores linear preprocessing techniques, especially ICA, to improve the extraction of discrete speech units from self-supervised speech model representations, enhancing clustering quality for speech recognition tasks.

Contribution

It introduces the use of ICA as a preprocessing step for clustering S3M representations, providing extensive analysis of its effects on DSU quality and interpretability.

Findings

01

ICA improves DSU clustering performance.

02

Preprocessing methods like ICA enhance speech recognition accuracy.

03

ICA components show orthogonality and interpretability.

Abstract

Self-supervised speech models (S3Ms) have become a common tool for the speech processing community, leveraging representations for downstream tasks. Clustering S3M representations yields discrete speech units (DSUs), which serve as compact representations for speech signals. DSUs are typically obtained by k-means clustering. Using DSUs often leads to strong performance in various tasks, including automatic speech recognition (ASR). However, even with the high dimensionality and redundancy of S3M representations, preprocessing S3M representations for better clustering remains unexplored, even though it can affect the quality of DSUs. In this paper, we investigate the potential of linear preprocessing methods for extracting DSUs. We evaluate standardization, principal component analysis, whitening, and independent component analysis (ICA) on DSU-based ASR benchmarks and demonstrate their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tomohikonakamura/ica_dsu_espnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsIndependent Component Analysis