Style Attuned Pre-training and Parameter Efficient Fine-tuning for Spoken Language Understanding
Jin Cao, Jun Wang, Wael Hamza, Kelly Vanee, Shang-Wen Li

TL;DR
This paper presents a novel SLU framework that combines conversational language modeling pre-training with a lightweight encoder, enabling efficient domain adaptation and matching state-of-the-art performance with minimal additional parameters.
Contribution
The proposed framework introduces a conversational language modeling pre-training task and a light encoder architecture, improving domain adaptation efficiency in spoken language understanding.
Findings
Achieves state-of-the-art results on multiple SLU datasets.
Adds only 4.4% parameters per domain adaptation.
Effectively captures conversational language with ASR errors.
Abstract
Neural models have yielded state-of-the-art results in deciphering spoken language understanding (SLU) problems; however, these models require a significant amount of domain-specific labeled examples for training, which is prohibitively expensive. While pre-trained language models like BERT have been shown to capture a massive amount of knowledge by learning from unlabeled corpora and solve SLU using fewer labeled examples for adaption, the encoding of knowledge is implicit and agnostic to downstream tasks. Such encoding results in model inefficiencies in parameter usage: an entirely new model is required for every domain. To address these challenges, we introduce a novel SLU framework, comprising a conversational language modeling (CLM) pre-training task and a light encoder architecture. The CLM pre-training enables networks to capture the representation of the language in conversation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsLinear Layer · Adam · Dense Connections · WordPiece · Multi-Head Attention · Layer Normalization · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Dropout
