DIET: Lightweight Language Understanding for Dialogue Systems

Tanja Bunk; Daksh Varshneya; Vladimir Vlasov; Alan Nichol

arXiv:2004.09936·cs.CL·May 12, 2020·113 cites

DIET: Lightweight Language Understanding for Dialogue Systems

Tanja Bunk, Daksh Varshneya, Vladimir Vlasov, Alan Nichol

PDF

Open Access 2 Repos

TL;DR

DIET introduces a lightweight transformer architecture for dialogue language understanding that outperforms larger models and does not rely on pre-trained embeddings, offering efficiency and high accuracy.

Contribution

The paper presents the DIET architecture, demonstrating its effectiveness and efficiency in dialogue NLU tasks without needing large pre-trained models.

Findings

01

DIET outperforms state-of-the-art models on multi-domain NLU datasets.

02

Pre-trained models show no clear benefit over supervised training for this task.

03

DIET is approximately six times faster to train than BERT-based models.

Abstract

Large-scale pre-trained language models have shown impressive results on language understanding benchmarks like GLUE and SuperGLUE, improving considerably over other pre-training methods like distributed representations (GloVe) and purely supervised approaches. We introduce the Dual Intent and Entity Transformer (DIET) architecture, and study the effectiveness of different pre-trained representations on intent and entity prediction, two common dialogue language understanding tasks. DIET advances the state of the art on a complex multi-domain NLU dataset and achieves similarly high performance on other simpler datasets. Surprisingly, we show that there is no clear benefit to using large pre-trained models for this task, and in fact DIET improves upon the current state of the art even in a purely supervised setup without any pre-trained embeddings. Our best performing model outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections