Adaptable Multi-Domain Language Model for Transformer ASR

Taewoo Lee; Min-Joong Lee; Tae Gyoon Kang; Seokyeoung Jung; Minseok; Kwon; Yeona Hong; Jungin Lee; Kyoung-Gu Woo; Ho-Gyeong Kim; Jiseung Jeong,; Jihyun Lee; Hosik Lee; Young Sang Choi

arXiv:2008.06208·eess.AS·February 12, 2021

Adaptable Multi-Domain Language Model for Transformer ASR

Taewoo Lee, Min-Joong Lee, Tae Gyoon Kang, Seokyeoung Jung, Minseok, Kwon, Yeona Hong, Jungin Lee, Kyoung-Gu Woo, Ho-Gyeong Kim, Jiseung Jeong,, Jihyun Lee, Hosik Lee, Young Sang Choi

PDF

TL;DR

This paper introduces an adapter-based multi-domain Transformer language model for speech recognition that efficiently adapts to new domains with minimal additional parameters, outperforming dedicated domain models in WER.

Contribution

The paper presents a novel adapter-based approach enabling multi-domain adaptation of Transformer LMs without retraining the entire model, reducing costs and complexity.

Findings

01

Outperforms dedicated music domain LM in WER

02

Requires only about 2-13% additional parameters for new domains

03

Eliminates need for costly full LM pre-training

Abstract

We propose an adapter based multi-domain Transformer based language model (LM) for Transformer ASR. The model consists of a big size common LM and small size adapters. The model can perform multi-domain adaptation with only the small size adapters and its related layers. The proposed model can reuse the full fine-tuned LM which is fine-tuned using all layers of an original model. The proposed LM can be expanded to new domains by adding about 2% of parameters for a first domain and 13% parameters for after second domain. The proposed model is also effective in reducing the model maintenance cost because it is possible to omit the costly and time-consuming common LM pre-training process. Using proposed adapter based approach, we observed that a general LM with adapter can outperform a dedicated music domain LM in terms of word error rate (WER).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Multi-Head Attention · Layer Normalization · Attention Is All You Need · Byte Pair Encoding · Dropout · Label Smoothing · Residual Connection