Unsupervised Acoustic Unit Discovery by Leveraging a   Language-Independent Subword Discriminative Feature Representation

Siyuan Feng; Piotr \.Zelasko; Laureano Moro-Vel\'azquez and; Odette Scharenborg

arXiv:2104.00994·eess.AS·June 8, 2021

Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation

Siyuan Feng, Piotr \.Zelasko, Laureano Moro-Vel\'azquez and, Odette Scharenborg

PDF

1 Repo

TL;DR

This paper introduces a two-stage, language-independent approach for unsupervised acoustic unit discovery using a multilingual subword-discriminative feature representation, outperforming previous methods on low-resource speech data.

Contribution

It proposes replacing monolingual with multilingual ASR for better language independence and compares segment representation methods, advancing unsupervised acoustic unit discovery techniques.

Findings

01

Outperforms state-of-the-art AUD in NMI and F-score

02

Multilingual ASR improves phone boundary estimation

03

Significant performance gap with ground-truth boundaries

Abstract

This paper tackles automatically discovering phone-like acoustic units (AUD) from unlabeled speech data. Past studies usually proposed single-step approaches. We propose a two-stage approach: the first stage learns a subword-discriminative feature representation and the second stage applies clustering to the learned representation and obtains phone-like clusters as the discovered acoustic units. In the first stage, a recently proposed method in the task of unsupervised subword modeling is improved by replacing a monolingual out-of-domain (OOD) ASR system with a multilingual one to create a subword-discriminative representation that is more language-independent. In the second stage, segment-level k-means is adopted, and two methods to represent the variable-length speech segments as fixed-dimension feature vectors are compared. Experiments on a very low-resource Mboshi language corpus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

syfengcuhk/mboshi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.