TL;DR
This paper reformulates task-oriented dialogue systems as natural language generation tasks using GPT-2, introducing a novel GPT-Adapter-CopyNet to improve transfer learning and entity generation, achieving superior results on benchmarks.
Contribution
It introduces GPT-Adapter-CopyNet, a new model combining adapters and CopyNet with GPT-2, to address dialogue entity inconsistency and catastrophic forgetting in dialogue systems.
Findings
Significantly outperforms baseline models on DSTC8 and MultiWOZ datasets.
Achieves higher automatic and human evaluation scores.
Effectively handles dialogue entity generation and transfer learning.
Abstract
In this paper, we propose to formulate the task-oriented dialogue system as the purely natural language generation task, so as to fully leverage the large-scale pre-trained models like GPT-2 and simplify complicated delexicalization prepossessing. However, directly applying this method heavily suffers from the dialogue entity inconsistency caused by the removal of delexicalized tokens, as well as the catastrophic forgetting problem of the pre-trained model during fine-tuning, leading to unsatisfactory performance. To alleviate these problems, we design a novel GPT-Adapter-CopyNet network, which incorporates the lightweight adapter and CopyNet modules into GPT-2 to achieve better performance on transfer learning and dialogue entity generation. Experimental results conducted on the DSTC8 Track 1 benchmark and MultiWOZ dataset demonstrate that our proposed approach significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Residual Connection · Layer Normalization · Dense Connections · Attention Dropout · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning
