DLGNet: A Transformer-based Model for Dialogue Response Generation

Oluwatobi Olabiyi; Erik T. Mueller

arXiv:1908.01841·cs.CL·September 6, 2019·5 cites

DLGNet: A Transformer-based Model for Dialogue Response Generation

Oluwatobi Olabiyi, Erik T. Mueller

PDF

Open Access

TL;DR

This paper introduces DLGNet, a transformer-based dialogue response generation model that significantly improves relevance, diversity, and coherence in multi-turn dialogues by leveraging long-range dependencies and innovative training techniques.

Contribution

The paper presents DLGNet, a novel transformer-based model for dialogue generation that outperforms existing models on multiple datasets using only maximum likelihood training.

Findings

01

Achieves state-of-the-art results on Movie Triples and Ubuntu datasets.

02

Produces more relevant, diverse, and coherent responses.

03

Utilizes long-range transformer architecture with random padding injection.

Abstract

Neural dialogue models, despite their successes, still suffer from lack of relevance, diversity, and in many cases coherence in their generated responses. These issues can attributed to reasons including (1) short-range model architectures that capture limited temporal dependencies, (2) limitations of the maximum likelihood training objective, (3) the concave entropy profile of dialogue datasets resulting in short and generic responses, and (4) the out-of-vocabulary problem leading to generation of a large number of <UNK> tokens. On the other hand, transformer-based models such as GPT-2 have demonstrated an excellent ability to capture long-range structures in language modeling tasks. In this paper, we present DLGNet, a transformer-based model for dialogue modeling. We specifically examine the use of DLGNet for multi-turn dialogue response generation. In our experiments, we evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Residual Connection · Attention Dropout · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia?