Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition
Xianrui Zheng, Chao Zhang, Philip C. Woodland

TL;DR
This paper explores adapting pre-trained language models like GPT, GPT-2, and BERT for speech recognition, proposing a conversion method for BERT's bidirectional outputs and demonstrating significant WERR improvements on ASR tasks.
Contribution
It introduces a novel conversion method for BERT's bidirectional outputs and combines multiple pre-trained models to enhance speech recognition accuracy.
Findings
GPT and GPT-2 combination outperforms models trained from scratch by up to 12% WERR.
Conversion method enables BERT to improve WERR by 3%.
Combining BERT, GPT, and GPT-2 yields further WERR improvements.
Abstract
Language models (LMs) pre-trained on massive amounts of text, in particular bidirectional encoder representations from Transformers (BERT), generative pre-training (GPT), and GPT-2, have become a key technology for many natural language processing tasks. In this paper, we present results using fine-tuned GPT, GPT-2, and their combination for automatic speech recognition (ASR). Unlike unidirectional LM GPT and GPT-2, BERT is bidirectional whose direct product of the output probabilities is no longer a valid language prior probability. A conversion method is proposed to compute the correct language prior probability based on bidirectional LM outputs in a mathematically exact way. Experimental results on the widely used AMI and Switchboard ASR tasks showed that the combination of the fine-tuned GPT and GPT-2 outperformed the combination of three neural LMs with different architectures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · WordPiece · Softmax · Cosine Annealing · Layer Normalization · GPT-2 · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay
