DialoGPT: Large-Scale Generative Pre-training for Conversational   Response Generation

Yizhe Zhang; Siqi Sun; Michel Galley; Yen-Chun Chen; Chris Brockett,; Xiang Gao; Jianfeng Gao; Jingjing Liu; Bill Dolan

arXiv:1911.00536·cs.CL·May 5, 2020·104 cites

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett,, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan

PDF

Open Access 5 Repos 10 Models

TL;DR

This paper introduces DialoGPT, a large-scale pre-trained transformer model for conversational response generation trained on Reddit data, achieving near-human performance and improving response relevance and consistency in dialogue systems.

Contribution

The paper presents DialoGPT, a novel large-scale pre-trained model specifically designed for open-domain dialogue, with publicly available training pipeline and model.

Findings

01

DialoGPT achieves near-human performance in single-turn dialogue.

02

Responses generated are more relevant and contextually consistent.

03

The model outperforms strong baseline systems in automatic and human evaluations.

Abstract

We present a large, tunable neural conversational response generation model, DialoGPT (dialogue generative pre-trained transformer). Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings. We show that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems. The pre-trained model and training pipeline are publicly released to facilitate research into neural response generation and the development of more intelligent open-domain dialogue systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax