Contextual Biasing of Language Models for Speech Recognition in   Goal-Oriented Conversational Agents

Ashish Shenoy; Sravan Bodapati; Katrin Kirchhoff

arXiv:2103.10325·cs.CL·June 8, 2021

Contextual Biasing of Language Models for Speech Recognition in Goal-Oriented Conversational Agents

Ashish Shenoy, Sravan Bodapati, Katrin Kirchhoff

PDF

Open Access

TL;DR

This paper enhances speech recognition in goal-oriented conversational agents by incorporating multi-turn context, dialog cues, and BERT-derived embeddings into language models, leading to a 7% reduction in word error rate.

Contribution

It introduces novel methods for integrating context into neural language models, including a new architecture utilizing BERT embeddings for improved speech recognition.

Findings

01

Achieved 7% relative WER reduction with contextual models

02

Demonstrated effectiveness of multi-turn and lexical context incorporation

03

Validated approach on goal-oriented audio datasets

Abstract

Goal-oriented conversational interfaces are designed to accomplish specific tasks and typically have interactions that tend to span multiple turns adhering to a pre-defined structure and a goal. However, conventional neural language models (NLM) in Automatic Speech Recognition (ASR) systems are mostly trained sentence-wise with limited context. In this paper, we explore different ways to incorporate context into a LSTM based NLM in order to model long range dependencies and improve speech recognition. Specifically, we use context carry over across multiple turns and use lexical contextual cues such as system dialog act from Natural Language Understanding (NLU) models and the user provided structure of the chatbot. We also propose a new architecture that utilizes context embeddings derived from BERT on sample utterances provided during inference time. Our experiments show a word error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsLinear Layer · Adam · Attention Is All You Need · Attention Dropout · Layer Normalization · WordPiece · Sigmoid Activation · Residual Connection · Tanh Activation · Linear Warmup With Linear Decay