Lightweight Prompt Biasing for Contextualized End-to-End ASR Systems
Bo Ren, Yu Shi, Jinyu Li

TL;DR
This paper presents a lightweight prompt biasing method for end-to-end ASR that improves recognition of rare and domain-specific entities by using a simple, efficient, multitask framework without structural changes.
Contribution
It introduces a novel prompt biasing approach with entity filtering for contextualized ASR, significantly improving accuracy while maintaining efficiency and simplicity.
Findings
Achieved 30.7% and 18.0% relative reduction in Entity WER.
Enhanced recognition accuracy for domain-specific entities.
Method is lightweight and does not require structural modifications.
Abstract
End-to-End Automatic Speech Recognition (ASR) has advanced significantly yet still struggles with rare and domain-specific entities. This paper introduces a simple yet efficient prompt-based biasing technique for contextualized ASR, enhancing recognition accuracy by leverage a unified multitask learning framework. The approach comprises two key components: a prompt biasing model which is trained to determine when to focus on entities in prompt, and a entity filtering mechanism which efficiently filters out irrelevant entities. Our method significantly enhances ASR accuracy on entities, achieving a relative 30.7% and 18.0% reduction in Entity Word Error Rate compared to the baseline model with shallow fusion on in-house domain dataset with small and large entity lists, respectively. The primary advantage of this method lies in its efficiency and simplicity without any structure change,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
MethodsFocus
