The effectiveness of unsupervised subword modeling with autoregressive   and cross-lingual phone-aware networks

Siyuan Feng; Odette Scharenborg

arXiv:2012.09544·eess.AS·June 8, 2021

The effectiveness of unsupervised subword modeling with autoregressive and cross-lingual phone-aware networks

Siyuan Feng, Odette Scharenborg

PDF

TL;DR

This paper introduces a two-stage unsupervised subword modeling framework combining autoregressive predictive coding and cross-lingual DNNs, demonstrating superior phoneme and articulatory feature capture over existing methods.

Contribution

It proposes a novel two-stage learning approach that effectively captures phoneme and articulatory features, improving unsupervised subword representation quality across languages.

Findings

01

Outperforms or matches state-of-the-art on ABX subword discriminability tasks.

02

Better at capturing diphthongs and articulatory features than monophthongs.

03

Shows positive correlation between cross-lingual label quality and phoneme information capture.

Abstract

This study addresses unsupervised subword modeling, i.e., learning acoustic feature representations that can distinguish between subword units of a language. We propose a two-stage learning framework that combines self-supervised learning and cross-lingual knowledge transfer. The framework consists of autoregressive predictive coding (APC) as the front-end and a cross-lingual deep neural network (DNN) as the back-end. Experiments on the ABX subword discriminability task conducted with the Libri-light and ZeroSpeech 2017 databases showed that our approach is competitive or superior to state-of-the-art studies. Comprehensive and systematic analyses at the phoneme- and articulatory feature (AF)-level showed that our approach was better at capturing diphthong than monophthong vowel information, while also differences in the amount of information captured for different types of consonants…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.