BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching   Speech Recognition

Peikun Chen; Fan Yu; Yuhao Lian; Hongfei Xue; Xucheng Wan; Naijun; Zheng; Huan Zhou; Lei Xie

arXiv:2310.02629·cs.SD·October 10, 2023

BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

Peikun Chen, Fan Yu, Yuhao Lian, Hongfei Xue, Xucheng Wan, Naijun, Zheng, Huan Zhou, Lei Xie

PDF

Open Access

TL;DR

This paper introduces BA-MoE, a boundary-aware mixture-of-experts model with language-specific adapters and boundary prediction, significantly improving code-switching speech recognition accuracy by better modeling language boundaries.

Contribution

It proposes a novel boundary-aware MoE model with language adapters and boundary prediction to enhance multi-language speech recognition performance.

Findings

01

Achieves 16.55% reduction in mixture error rate on Mandarin-English dataset.

02

Effectively models language boundaries and improves language-specific representation learning.

03

Outperforms baseline models in code-switching speech recognition tasks.

Abstract

Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing