Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling
Siyuan Feng, Odette Scharenborg

TL;DR
This paper presents a novel unsupervised subword modeling approach combining autoregressive predictive coding and cross-lingual phone-aware neural networks, achieving state-of-the-art results with less training data.
Contribution
It introduces a two-stage bottleneck feature learning framework with cross-lingual labels, demonstrating robustness and efficiency in low-resource scenarios.
Findings
APC improves front-end feature pretraining effectiveness.
System outperforms state-of-the-art on Libri-light and ZeroSpeech 2017.
Less training data needed when using APC pretraining.
Abstract
This study addresses unsupervised subword modeling, i.e., learning feature representations that can distinguish subword units of a language. The proposed approach adopts a two-stage bottleneck feature (BNF) learning framework, consisting of autoregressive predictive coding (APC) as a front-end and a DNN-BNF model as a back-end. APC pretrained features are set as input features to a DNN-BNF model. A language-mismatched ASR system is used to provide cross-lingual phone labels for DNN-BNF model training. Finally, BNFs are extracted as the subword-discriminative feature representation. A second aim of this work is to investigate the robustness of our approach's effectiveness to different amounts of training data. The results on Libri-light and the ZeroSpeech 2017 databases show that APC is effective in front-end feature pretraining. Our whole system outperforms the state of the art on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
