M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
Jiaming Zhou, Shiwan Zhao, Jiabei He, Hui Wang, Wenjia Zeng, Yong, Chen, Haoqin Sun, Aobo Kong, Yong Qin

TL;DR
M2R-Whisper introduces a multi-stage, multi-scale retrieval augmentation method that enhances multilingual ASR accuracy, especially for low-resource dialects, by combining sentence-level and token-level retrieval without model retraining.
Contribution
The paper presents a novel retrieval-augmented approach that improves Whisper's ASR performance in low-resource settings through multi-stage, multi-scale retrieval strategies.
Findings
Significant accuracy improvements on Mandarin and subdialect datasets.
Effective error mitigation through combined sentence-level and token-level retrieval.
Achieved improvements without additional parameter updates.
Abstract
State-of-the-art models like OpenAI's Whisper exhibit strong performance in multilingual automatic speech recognition (ASR), but they still face challenges in accurately recognizing diverse subdialects. In this paper, we propose M2R-whisper, a novel multi-stage and multi-scale retrieval augmentation approach designed to enhance ASR performance in low-resource settings. Building on the principles of in-context learning (ICL) and retrieval-augmented techniques, our method employs sentence-level ICL in the pre-processing stage to harness contextual information, while integrating token-level k-Nearest Neighbors (kNN) retrieval as a post-processing step to further refine the final output distribution. By synergistically combining sentence-level and token-level retrieval strategies, M2R-whisper effectively mitigates various types of recognition errors. Experiments conducted on Mandarin and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining
