Exploring SSL Discrete Tokens for Multilingual ASR
Mingyu Cui, Daxin Tan, Yifan Yang, Dingdong Wang, Huimeng Wang, Xiao, Chen, Xie Chen, Xunying Liu

TL;DR
This paper evaluates the effectiveness of SSL-generated discrete tokens for multilingual ASR, demonstrating comparable or improved performance over traditional features across multiple languages with notable WER reductions.
Contribution
It provides a comprehensive comparison of SSL discrete tokens for multilingual ASR, filling a gap in understanding their performance across diverse language domains.
Findings
Discrete tokens achieve comparable results to Fbank features in ASR.
Average WER reduction of 0.31% and 1.76% on dev and test sets.
Significant WER reduction of 6.82% on Polish test set.
Abstract
With the advancement of Self-supervised Learning (SSL) in speech-related tasks, there has been growing interest in utilizing discrete tokens generated by SSL for automatic speech recognition (ASR), as they offer faster processing techniques. However, previous studies primarily focused on multilingual ASR with Fbank features or English ASR with discrete tokens, leaving a gap in adapting discrete tokens for multilingual ASR scenarios. This study presents a comprehensive comparison of discrete tokens generated by various leading SSL models across multiple language domains. We aim to explore the performance and efficiency of speech discrete tokens across multiple language domains for both monolingual and multilingual ASR scenarios. Experimental results demonstrate that discrete tokens achieve comparable results against systems trained on Fbank features in ASR tasks across seven language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems
