Adaptable Multi-Domain Language Model for Transformer ASR
Taewoo Lee, Min-Joong Lee, Tae Gyoon Kang, Seokyeoung Jung, Minseok, Kwon, Yeona Hong, Jungin Lee, Kyoung-Gu Woo, Ho-Gyeong Kim, Jiseung Jeong,, Jihyun Lee, Hosik Lee, Young Sang Choi

TL;DR
This paper introduces an adapter-based multi-domain Transformer language model for speech recognition that efficiently adapts to new domains with minimal additional parameters, outperforming dedicated domain models in WER.
Contribution
The paper presents a novel adapter-based approach enabling multi-domain adaptation of Transformer LMs without retraining the entire model, reducing costs and complexity.
Findings
Outperforms dedicated music domain LM in WER
Requires only about 2-13% additional parameters for new domains
Eliminates need for costly full LM pre-training
Abstract
We propose an adapter based multi-domain Transformer based language model (LM) for Transformer ASR. The model consists of a big size common LM and small size adapters. The model can perform multi-domain adaptation with only the small size adapters and its related layers. The proposed model can reuse the full fine-tuned LM which is fine-tuned using all layers of an original model. The proposed LM can be expanded to new domains by adding about 2% of parameters for a first domain and 13% parameters for after second domain. The proposed model is also effective in reducing the model maintenance cost because it is possible to omit the costly and time-consuming common LM pre-training process. Using proposed adapter based approach, we observed that a general LM with adapter can outperform a dedicated music domain LM in terms of word error rate (WER).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Multi-Head Attention · Layer Normalization · Attention Is All You Need · Byte Pair Encoding · Dropout · Label Smoothing · Residual Connection
