MGSC: A Multi-granularity Consistency Framework for Robust End-to-end Asr
Xuwen Yang

TL;DR
This paper introduces MGSC, a framework that enhances end-to-end ASR robustness by enforcing internal semantic and token alignment consistency, significantly reducing errors in noisy environments.
Contribution
It proposes a novel, model-agnostic consistency regularization framework that leverages multi-granularity internal self-consistency to improve ASR robustness against noise.
Findings
Reduces Character Error Rate by 8.7% on noisy data
Prevents severe semantic mistakes in noisy environments
Uncovers synergy between macro and micro-level consistency
Abstract
End-to-end ASR models, despite their success on benchmarks, often pro-duce catastrophic semantic errors in noisy environments. We attribute this fragility to the prevailing 'direct mapping' objective, which solely penalizes final output errors while leaving the model's internal computational pro-cess unconstrained. To address this, we introduce the Multi-Granularity Soft Consistency (MGSC) framework, a model-agnostic, plug-and-play module that enforces internal self-consistency by simultaneously regulariz-ing macro-level sentence semantics and micro-level token alignment. Cru-cially, our work is the first to uncover a powerful synergy between these two consistency granularities: their joint optimization yields robustness gains that significantly surpass the sum of their individual contributions. On a public dataset, MGSC reduces the average Character Error Rate by a relative 8.7% across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
