H-PRM: A Pluggable Hotword Pre-Retrieval Module for Various Speech Recognition Systems
Huangyu Dai, Lingtao Mao, Ben Chen, Zihan Wang, Zihan Liang, Ying Han, Chenyi Lei, Han Li

TL;DR
This paper presents H-PRM, a pluggable hotword pre-retrieval module that improves hotword recognition in various speech recognition systems by measuring acoustic similarity, enhancing accuracy especially with large hotword sets.
Contribution
The paper introduces H-PRM, a novel, plug-and-play hotword pre-retrieval module that can be integrated into traditional ASR models and Audio LLMs for improved hotword recognition.
Findings
H-PRM significantly increases hotwords post-recall rate (PRR).
H-PRM outperforms existing hotword customization methods.
The approach is effective for large-scale hotword sets.
Abstract
Hotword customization is crucial in ASR to enhance the accuracy of domain-specific terms. It has been primarily driven by the advancements in traditional models and Audio large language models (LLMs). However, existing models often struggle with large-scale hotwords, as the recognition rate drops dramatically with the number of hotwords increasing. In this paper, we introduce a novel hotword customization system that utilizes a hotword pre-retrieval module (H-PRM) to identify the most relevant hotword candidate by measuring the acoustic similarity between the hotwords and the speech segment. This plug-and-play solution can be easily integrated into traditional models such as SeACo-Paraformer, significantly enhancing hotwords post-recall rate (PRR). Additionally, we incorporate H-PRM into Audio LLMs through a prompt-based approach, enabling seamless customization of hotwords. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
