BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition
Peikun Chen, Fan Yu, Yuhao Lian, Hongfei Xue, Xucheng Wan, Naijun, Zheng, Huan Zhou, Lei Xie

TL;DR
This paper introduces BA-MoE, a boundary-aware mixture-of-experts model with language-specific adapters and boundary prediction, significantly improving code-switching speech recognition accuracy by better modeling language boundaries.
Contribution
It proposes a novel boundary-aware MoE model with language adapters and boundary prediction to enhance multi-language speech recognition performance.
Findings
Achieves 16.55% reduction in mixture error rate on Mandarin-English dataset.
Effectively models language boundaries and improves language-specific representation learning.
Outperforms baseline models in code-switching speech recognition tasks.
Abstract
Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing
