Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System

Yanfan Du; Jun Zhang; Bin Wang; Jin Qiu; Lu Huang; Yuan Ge; Xiaoqian Liu; Tong Xiao; Jingbo Zhu

arXiv:2508.18701·cs.CL·August 27, 2025

Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-Text System

Yanfan Du, Jun Zhang, Bin Wang, Jin Qiu, Lu Huang, Yuan Ge, Xiaoqian Liu, Tong Xiao, Jingbo Zhu

PDF

1 Models 1 Datasets

TL;DR

Attention2Probability is a novel, lightweight method that leverages attention weights to accurately estimate the presence of domain-specific terms in speech-to-text systems, improving recognition of neologisms and terminology.

Contribution

It introduces an attention-driven probability estimation approach combined with curriculum learning and provides a new dataset for terminology in speech recognition.

Findings

01

Outperforms VectorDB with up to 92.57% recall in Chinese

02

Achieves low latency of 8.71ms per query

03

Improves terminology accuracy by 6-17% in recognition tasks

Abstract

Recent advances in speech large language models (SLMs) have improved speech recognition and translation in general domains, but accurately generating domain-specific terms or neologisms remains challenging. To address this, we propose Attention2Probability: attention-driven terminology probability estimation for robust speech-to-text system, which is lightweight, flexible, and accurate. Attention2Probability converts cross-attention weights between speech and terminology into presence probabilities, and it further employs curriculum learning to enhance retrieval accuracy. Furthermore, to tackle the lack of data for speech-to-text tasks with terminology intervention, we create and release a new speech dataset with terminology to support future research in this area. Experimental results show that Attention2Probability significantly outperforms the VectorDB method on our test set.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ByteDance/Attention2Probability
model· ♡ 3
♡ 3

Datasets

ByteDance/Attention2Probability
dataset· 78 dl
78 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.