Alternating Recurrent Dialog Model with Large-scale Pre-trained Language Models
Qingyang Wu, Yichi Zhang, Yu Li, Zhou Yu

TL;DR
This paper introduces ARDM, a dialog model leveraging large pre-trained language models that models speakers separately, requiring no human annotations, and performs well on task-oriented and persuasion dialog tasks.
Contribution
The paper presents ARDM, a novel framework that effectively utilizes large pre-trained language models for dialog generation without supervision from human annotations.
Findings
ARDM outperforms or matches state-of-the-art on CamRest676 and MultiWOZ datasets.
ARDM generalizes to persuasion tasks, generating human-like persuasive responses.
No human annotation supervision needed for effective dialog modeling.
Abstract
Existing dialog system models require extensive human annotations and are difficult to generalize to different tasks. The recent success of large pre-trained language models such as BERT and GPT-2 (Devlin et al., 2019; Radford et al., 2019) have suggested the effectiveness of incorporating language priors in down-stream NLP tasks. However, how much pre-trained language models can help dialog response generation is still under exploration. In this paper, we propose a simple, general, and effective framework: Alternating Roles Dialog Model (ARDM). ARDM models each speaker separately and takes advantage of the large pre-trained language model. It requires no supervision from human annotations such as belief states or dialog acts to achieve effective conversations. ARDM outperforms or is on par with state-of-the-art methods on two popular task-oriented dialog datasets: CamRest676 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Cosine Annealing · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections
