Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition
Mengze Hong, Yi Gu, Di Jiang, Hanlin Gu, Chen Jason Zhang, Lu Wang, Zhiyang Su

TL;DR
This paper introduces novel algorithms for optimizing heterogeneous language models in federated ASR systems, improving accuracy and convergence speed while preserving data privacy.
Contribution
It proposes a match-and-merge paradigm with GMMA and RMMA algorithms to effectively merge diverse language models in federated speech recognition.
Findings
RMMA achieves lower error rates than baselines
RMMA converges up to seven times faster than GMMA
Experiments on seven datasets validate the effectiveness of the approach
Abstract
Training automatic speech recognition (ASR) models increasingly relies on decentralized federated learning to ensure data privacy and accessibility, producing multiple local models that require effective merging. In hybrid ASR systems, while acoustic models can be merged using established methods, the language model (LM) for rescoring the N-best speech recognition list faces challenges due to the heterogeneity of non-neural n-gram models and neural network models. This paper proposes a heterogeneous LM optimization task and introduces a match-and-merge paradigm with two algorithms: the Genetic Match-and-Merge Algorithm (GMMA), using genetic operations to evolve and pair LMs, and the Reinforced Match-and-Merge Algorithm (RMMA), leveraging reinforcement learning for efficient convergence. Experiments on seven OpenSLR datasets show RMMA achieves the lowest average Character Error Rate and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Machine Learning and Data Classification · Face recognition and analysis
