DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
Shikib Mehri, Mihail Eric, Dilek Hakkani-Tur

TL;DR
DialoGLUE is a comprehensive benchmark for evaluating natural language understanding in task-oriented dialogue, promoting research in domain adaptation, transfer learning, and sample efficiency.
Contribution
It introduces a new benchmark with diverse datasets and baseline models, advancing the development of more adaptable dialogue systems.
Findings
Baseline models outperform vanilla BERT on 5 out of 7 tasks.
Pre-training on dialogue data improves task performance.
Benchmark facilitates progress in general task-oriented dialogue modeling.
Abstract
A long-standing goal of task-oriented dialogue research is the ability to flexibly adapt dialogue models to new domains. To progress research in this direction, we introduce DialoGLUE (Dialogue Language Understanding Evaluation), a public benchmark consisting of 7 task-oriented dialogue datasets covering 4 distinct natural language understanding tasks, designed to encourage dialogue research in representation-based transfer, domain adaptation, and sample-efficient task learning. We release several strong baseline models, demonstrating performance improvements over a vanilla BERT architecture and state-of-the-art results on 5 out of 7 tasks, by pre-training on a large open-domain dialogue corpus and task-adaptive self-supervised training. Through the DialoGLUE benchmark, the baseline methods, and our evaluation scripts, we hope to facilitate progress towards the goal of developing more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsLinear Layer · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Dropout · Linear Warmup With Linear Decay · Layer Normalization · Attention Dropout · WordPiece · Weight Decay
