NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model

Yen-Ting Lin; Zhehuai Chen; Piotr Zelasko; Zhen Wan; Xuesong Yang; Zih-Ching Chen; Krishna C Puvvada; Szu-Wei Fu; Ke Hu; Jun Wei Chiu; Jagadeesh Balam; Boris Ginsburg; Yu-Chiang Frank Wang; Chao-Han Huck Yang

arXiv:2411.05945·cs.CL·December 2, 2025

NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model

Yen-Ting Lin, Zhehuai Chen, Piotr Zelasko, Zhen Wan, Xuesong Yang, Zih-Ching Chen, Krishna C Puvvada, Szu-Wei Fu, Ke Hu, Jun Wei Chiu, Jagadeesh Balam, Boris Ginsburg, Yu-Chiang Frank Wang, Chao-Han Huck Yang

PDF

Open Access 1 Video

TL;DR

NeKo introduces a multi-task Mixture-of-Experts model for cross-modality post-recognition error correction, achieving state-of-the-art results across speech, language, and vision datasets with fewer parameters.

Contribution

The paper presents a novel Multi-Task Correction MoE that effectively learns dataset-specific features and routes tokens to specialized experts, improving error correction performance.

Findings

01

Achieves 5.0% relative WER reduction on Open ASR Leaderboard

02

Outperforms GPT-3.5 and Claude-Opus in zero-shot WER reduction

03

Performs well on grammar and post-OCR correction tasks

Abstract

Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting their knowledge in a single model. Previous methods achieve this by having separate correction language models, resulting in a significant increase in parameters. In this work, we present Mixture-of-Experts as a solution, highlighting that MoEs are much more than a scalability tool. We propose a Multi-Task Correction MoE, where we train the experts to become an ``expert'' of speech-to-text, language-to-text and vision-to-text datasets by learning to route each dataset's tokens to its mapped expert. Experiments on the Open ASR Leaderboard show that we explore a new state-of-the-art performance by achieving an average relative 5.0% WER reduction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Layer Normalization · Adam · Attention Dropout · Mixture of Experts · Multi-Head Attention