Hybrid Attention-based Encoder-decoder Model for Efficient Language   Model Adaptation

Shaoshi Ling; Guoli Ye; Rui Zhao; Yifan Gong

arXiv:2309.07369·eess.AS·September 17, 2024

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

Shaoshi Ling, Guoli Ye, Rui Zhao, Yifan Gong

PDF

Open Access

TL;DR

This paper introduces the HAED model, which separates acoustic and language models in speech recognition, enabling efficient text-based adaptation and achieving significant WER improvements on out-of-domain data.

Contribution

The proposed HAED model maintains modularity, allowing effective text adaptation in AED systems, which was challenging with end-to-end joint optimization.

Findings

01

23% relative WER reduction with out-of-domain text adaptation

02

Minor WER degradation on general test set

03

Preserves modularity of traditional hybrid systems

Abstract

The attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years. However, the joint optimization of acoustic model and language model in end-to-end manner has created challenges for text adaptation. In particular, effective, quick and inexpensive adaptation with text input has become a primary concern for deploying AED systems in the industry. To address this issue, we propose a novel model, the hybrid attention-based encoder-decoder (HAED) speech recognition model that preserves the modularity of conventional hybrid automatic speech recognition systems. Our HAED model separates the acoustic and language models, allowing for the use of conventional text-based language model adaptation techniques. We demonstrate that the proposed HAED model yields 23% relative Word Error Rate (WER) improvements when out-of-domain text data is used for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Music and Audio Processing