Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

Xianrui Zheng; Chao Zhang; Philip C. Woodland

arXiv:2108.07789·cs.CL·October 4, 2021·5 cites

Adapting GPT, GPT-2 and BERT Language Models for Speech Recognition

Xianrui Zheng, Chao Zhang, Philip C. Woodland

PDF

Open Access

TL;DR

This paper explores adapting pre-trained language models like GPT, GPT-2, and BERT for speech recognition, proposing a conversion method for BERT's bidirectional outputs and demonstrating significant WERR improvements on ASR tasks.

Contribution

It introduces a novel conversion method for BERT's bidirectional outputs and combines multiple pre-trained models to enhance speech recognition accuracy.

Findings

01

GPT and GPT-2 combination outperforms models trained from scratch by up to 12% WERR.

02

Conversion method enables BERT to improve WERR by 3%.

03

Combining BERT, GPT, and GPT-2 yields further WERR improvements.

Abstract

Language models (LMs) pre-trained on massive amounts of text, in particular bidirectional encoder representations from Transformers (BERT), generative pre-training (GPT), and GPT-2, have become a key technology for many natural language processing tasks. In this paper, we present results using fine-tuned GPT, GPT-2, and their combination for automatic speech recognition (ASR). Unlike unidirectional LM GPT and GPT-2, BERT is bidirectional whose direct product of the output probabilities is no longer a valid language prior probability. A conversion method is proposed to compute the correct language prior probability based on bidirectional LM outputs in a mathematically exact way. Experimental results on the widely used AMI and Switchboard ASR tasks showed that the combination of the fine-tuned GPT and GPT-2 outperformed the combination of three neural LMs with different architectures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · WordPiece · Softmax · Cosine Annealing · Layer Normalization · GPT-2 · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay