Cross-utterance Reranking Models with BERT and Graph Convolutional Networks for Conversational Speech Recognition
Shih-Hsuan Chiu, Tien-Hong Lo, Fu-An Chao, Berlin Chen

TL;DR
This paper introduces a novel approach for conversational speech recognition that combines graph convolutional networks and BERT to incorporate cross-utterance context, improving reranking accuracy.
Contribution
It proposes a new graph-structured representation of historical context and integrates GCN with BERT for enhanced cross-utterance modeling in ASR reranking.
Findings
GCN effectively captures global relational information among words.
The combined GCN and BERT model outperforms current top methods on the AMI dataset.
The approach demonstrates significant improvements in conversational speech recognition accuracy.
Abstract
How to effectively incorporate cross-utterance information cues into a neural language model (LM) has emerged as one of the intriguing issues for automatic speech recognition (ASR). Existing research efforts on improving contextualization of an LM typically regard previous utterances as a sequence of additional input and may fail to capture complex global structural dependencies among these utterances. In view of this, we in this paper seek to represent the historical context information of an utterance as graph-structured data so as to distill cross-utterances, global word interaction relationships. To this end, we apply a graph convolutional network (GCN) on the resulting graph to obtain the corresponding GCN embeddings of historical words. GCN has recently found its versatile applications on social-network analysis, text summarization, and among others due mainly to its ability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsGraph Convolutional Network
