Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition

Mengze Hong; Yi Gu; Di Jiang; Hanlin Gu; Chen Jason Zhang; Lu Wang; Zhiyang Su

arXiv:2603.04945·cs.CL·March 6, 2026

Federated Heterogeneous Language Model Optimization for Hybrid Automatic Speech Recognition

Mengze Hong, Yi Gu, Di Jiang, Hanlin Gu, Chen Jason Zhang, Lu Wang, Zhiyang Su

PDF

Open Access

TL;DR

This paper introduces novel algorithms for optimizing heterogeneous language models in federated ASR systems, improving accuracy and convergence speed while preserving data privacy.

Contribution

It proposes a match-and-merge paradigm with GMMA and RMMA algorithms to effectively merge diverse language models in federated speech recognition.

Findings

01

RMMA achieves lower error rates than baselines

02

RMMA converges up to seven times faster than GMMA

03

Experiments on seven datasets validate the effectiveness of the approach

Abstract

Training automatic speech recognition (ASR) models increasingly relies on decentralized federated learning to ensure data privacy and accessibility, producing multiple local models that require effective merging. In hybrid ASR systems, while acoustic models can be merged using established methods, the language model (LM) for rescoring the N-best speech recognition list faces challenges due to the heterogeneity of non-neural n-gram models and neural network models. This paper proposes a heterogeneous LM optimization task and introduces a match-and-merge paradigm with two algorithms: the Genetic Match-and-Merge Algorithm (GMMA), using genetic operations to evolve and pair LMs, and the Reinforced Match-and-Merge Algorithm (RMMA), leveraging reinforcement learning for efficient convergence. Experiments on seven OpenSLR datasets show RMMA achieves the lowest average Character Error Rate and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Machine Learning and Data Classification · Face recognition and analysis