Context-aware RNNLM Rescoring for Conversational Speech Recognition
Kun Wei, Pengcheng Guo, Hang Lv, Zhen Tu, Lei Xie

TL;DR
This paper introduces a context-aware RNNLM rescoring method for conversational speech recognition that leverages long-term context and topic information, significantly reducing error rates.
Contribution
It extends RNNLM rescoring by incorporating sentence-level context and tag-based concatenation, improving recognition accuracy over prior methods.
Findings
Achieved up to 13.1% CER reduction over first-pass decoding.
Achieved up to 6% CER reduction over standard lattice rescoring.
Effective use of contextual and topic information in conversational speech recognition.
Abstract
Conversational speech recognition is regarded as a challenging task due to its free-style speaking and long-term contextual dependencies. Prior work has explored the modeling of long-range context through RNNLM rescoring with improved performance. To further take advantage of the persisted nature during a conversation, such as topics or speaker turn, we extend the rescoring procedure to a new context-aware manner. For RNNLM training, we capture the contextual dependencies by concatenating adjacent sentences with various tag words, such as speaker or intention information. For lattice rescoring, the lattice of adjacent sentences are also connected with the first-pass decoded result by tag words. Besides, we also adopt a selective concatenation strategy based on tf-idf, making the best use of contextual similarity to improve transcription performance. Results on four different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems
