Different Strokes for Different Folks: Investigating Appropriate Further   Pre-training Approaches for Diverse Dialogue Tasks

Yao Qiu; Jinchao Zhang; Jie Zhou

arXiv:2109.06524·cs.CL·September 15, 2021

Different Strokes for Different Folks: Investigating Appropriate Further Pre-training Approaches for Diverse Dialogue Tasks

Yao Qiu, Jinchao Zhang, Jie Zhou

PDF

TL;DR

This paper explores how customizing further pre-training tasks, beyond conventional domain adaptation, can significantly improve diverse task-oriented dialogue systems by aligning pre-training strategies with specific downstream task needs.

Contribution

It introduces the idea that different dialogue tasks require tailored pre-training tasks, demonstrating their effectiveness through extensive experiments and providing empirical guidelines for task-specific pre-training.

Findings

01

Different downstream tasks prefer different pre-training tasks.

02

Most pre-training tasks improve specific target tasks rather than all.

03

Designing task-specific pre-training tasks is crucial for performance enhancement.

Abstract

Loading models pre-trained on the large-scale corpus in the general domain and fine-tuning them on specific downstream tasks is gradually becoming a paradigm in Natural Language Processing. Previous investigations prove that introducing a further pre-training phase between pre-training and fine-tuning phases to adapt the model on the domain-specific unlabeled data can bring positive effects. However, most of these further pre-training works just keep running the conventional pre-training task, e.g., masked language model, which can be regarded as the domain adaptation to bridge the data distribution gap. After observing diverse downstream tasks, we suggest that different tasks may also need a further pre-training phase with appropriate training tasks to bridge the task formulation gap. To investigate this, we carry out a study for improving multiple task-oriented dialogue downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.