Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator
Guangzhi Sun, Chao Zhang, Phil Woodland

TL;DR
This paper introduces a novel GNN-based method for end-to-end contextual ASR, significantly improving recognition accuracy for rare and unseen words by leveraging tree-constrained pointer generators.
Contribution
It proposes a new GNN encoding approach for contextual ASR that enhances prediction of biasing words, demonstrating substantial WER reductions on standard datasets.
Findings
Achieved over 60% WER reduction for rare and unseen words.
GNN encodings improve lookahead and prediction accuracy in ASR decoding.
Effective combination of GCN and GraphSAGE structures enhances performance.
Abstract
The incorporation of biasing words obtained through contextual knowledge is of paramount importance in automatic speech recognition (ASR) applications. This paper proposes an innovative method for achieving end-to-end contextual ASR using graph neural network (GNN) encodings based on the tree-constrained pointer generator method. GNN node encodings facilitate lookahead for future word pieces in the process of ASR decoding at each tree node by incorporating information about all word pieces on the tree branches rooted from it. This results in a more precise prediction of the generation probability of the biasing words. The study explores three GNN encoding techniques, namely tree recursive neural networks, graph convolutional network (GCN), and GraphSAGE, along with different combinations of the complementary GCN and GraphSAGE structures. The performance of the systems was evaluated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
MethodsGraph Neural Network · Graph Convolutional Network · GraphSAGE
