Tree-constrained Pointer Generator with Graph Neural Network Encodings   for Contextual Speech Recognition

Guangzhi Sun; Chao Zhang; Philip C. Woodland

arXiv:2207.00857·cs.SD·July 5, 2022

Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

Guangzhi Sun, Chao Zhang, Philip C. Woodland

PDF

Open Access

TL;DR

This paper introduces a graph neural network-encoded, tree-constrained pointer generator for end-to-end contextual speech recognition, significantly improving biasing word recognition with minimal additional computational cost.

Contribution

It presents a novel GNN-based encoding method within TCPGen for better biasing word prediction in end-to-end ASR systems.

Findings

01

Achieved about 15% relative WER reduction on biasing words.

02

Demonstrated effectiveness on Librispeech and AMI datasets.

03

Maintained low additional computational cost.

Abstract

Incorporating biasing words obtained as contextual knowledge is critical for many automatic speech recognition (ASR) applications. This paper proposes the use of graph neural network (GNN) encodings in a tree-constrained pointer generator (TCPGen) component for end-to-end contextual ASR. By encoding the biasing words in the prefix-tree with a tree-based GNN, lookahead for future wordpieces in end-to-end ASR decoding is achieved at each tree node by incorporating information about all wordpieces on the tree branches rooted from it, which allows a more accurate prediction of the generation probability of the biasing words. Systems were evaluated on the Librispeech corpus using simulated biasing tasks, and on the AMI corpus by proposing a novel visual-grounded contextual ASR pipeline that extracts biasing words from slides alongside each meeting. Results showed that TCPGen with GNN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems

MethodsGraph Neural Network