Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition
Qianying Liu, Zhuo Gong, Zhengdong Yang, Yuhang Yang, Sheng Li,, Chenchen Ding, Nobuaki Minematsu, Hao Huang, Fei Cheng, Chenhui Chu, Sadao, Kurohashi

TL;DR
This paper introduces a multilingual hierarchical Softmax approach that leverages neighboring languages' similarities to improve low-resource speech recognition accuracy and efficiency.
Contribution
It proposes a novel hierarchical Softmax decoding method based on linguistic unit similarities across languages for low-resource speech recognition.
Findings
Improved recognition accuracy in low-resource settings
Enhanced decoding efficiency
Effective cross-lingual knowledge sharing
Abstract
Low-resource speech recognition has been long-suffering from insufficient training data. In this paper, we propose an approach that leverages neighboring languages to improve low-resource scenario performance, founded on the hypothesis that similar linguistic units in neighboring languages exhibit comparable term frequency distributions, which enables us to construct a Huffman tree for performing multilingual hierarchical Softmax decoding. This hierarchical structure enables cross-lingual knowledge sharing among similar tokens, thereby enhancing low-resource training outcomes. Empirical analyses demonstrate that our method is effective in improving the accuracy and efficiency of low-resource speech recognition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsSoftmax · Hierarchical Softmax
