Multi-Task Learning for Situated Multi-Domain End-to-End Dialogue Systems
Po-Nien Kung, Chung-Cheng Chang, Tse-Hsuan Yang, Hsin-Kai Hsu, Yu-Jia, Liou, Yun-Nung Chen

TL;DR
This paper introduces a multi-task learning approach using GPT-2 for multi-domain, multi-modal dialogue systems, achieving superior performance over specialized models and demonstrating the effectiveness of various training strategies.
Contribution
It presents a novel multi-task training framework for GPT-2 that handles complex, multi-domain dialogue tasks with improved results and comprehensive ablation studies.
Findings
Single GPT-2 model outperforms domain-specific models
Multi-task learning enhances dialogue system performance
Proposed strategies further improve GPT-2 based dialogue systems
Abstract
Task-oriented dialogue systems have been a promising area in the NLP field. Previous work showed the effectiveness of using a single GPT-2 based model to predict belief states and responses via causal language modeling. In this paper, we leverage multi-task learning techniques to train a GPT-2 based model on a more challenging dataset with multiple domains, multiple modalities, and more diversity in output formats. Using only a single model, our method achieves better performance on all sub-tasks, across domains, compared to task and domain-specific models. Furthermore, we evaluated several proposed strategies for GPT-2 based dialogue systems with comprehensive ablation studies, showing that all techniques can further improve the performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Discriminative Fine-Tuning · Adam · Linear Warmup With Cosine Annealing · Softmax · Dropout
