Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
Zhengdong Yang, Qianying Liu, Sheng Li, Fei Cheng, Chenhui Chu

TL;DR
This paper introduces a cross-lingual embedding clustering method to improve hierarchical Softmax decoding in low-resource multilingual speech recognition, leading to better accuracy across multiple languages.
Contribution
It proposes a novel cross-lingual embedding clustering approach for hierarchical Softmax, addressing limitations of previous methods and enhancing multilingual ASR performance.
Findings
Improved accuracy in low-resource multilingual ASR
Effective sharing of token representations across languages
Outperforms Huffman-based H-Softmax in experiments
Abstract
We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsSoftmax · Hierarchical Softmax
