Alternating Recurrent Dialog Model with Large-scale Pre-trained Language   Models

Qingyang Wu; Yichi Zhang; Yu Li; Zhou Yu

arXiv:1910.03756·cs.CL·April 28, 2021·32 cites

Alternating Recurrent Dialog Model with Large-scale Pre-trained Language Models

Qingyang Wu, Yichi Zhang, Yu Li, Zhou Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces ARDM, a dialog model leveraging large pre-trained language models that models speakers separately, requiring no human annotations, and performs well on task-oriented and persuasion dialog tasks.

Contribution

The paper presents ARDM, a novel framework that effectively utilizes large pre-trained language models for dialog generation without supervision from human annotations.

Findings

01

ARDM outperforms or matches state-of-the-art on CamRest676 and MultiWOZ datasets.

02

ARDM generalizes to persuasion tasks, generating human-like persuasive responses.

03

No human annotation supervision needed for effective dialog modeling.

Abstract

Existing dialog system models require extensive human annotations and are difficult to generalize to different tasks. The recent success of large pre-trained language models such as BERT and GPT-2 (Devlin et al., 2019; Radford et al., 2019) have suggested the effectiveness of incorporating language priors in down-stream NLP tasks. However, how much pre-trained language models can help dialog response generation is still under exploration. In this paper, we propose a simple, general, and effective framework: Alternating Roles Dialog Model (ARDM). ARDM models each speaker separately and takes advantage of the large pre-trained language model. It requires no supervision from human annotations such as belief states or dialog acts to achieve effective conversations. ARDM outperforms or is on par with state-of-the-art methods on two popular task-oriented dialog datasets: CamRest676 and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

budzianowski/multiwoz
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Cosine Annealing · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections