On Task-Adaptive Pretraining for Dialogue Response Selection

Tzu-Hsiang Lin; Ta-Chung Chi; Anna Rumshisky

arXiv:2210.04073·cs.CL·October 11, 2022·1 cites

On Task-Adaptive Pretraining for Dialogue Response Selection

Tzu-Hsiang Lin, Ta-Chung Chi, Anna Rumshisky

PDF

Open Access

TL;DR

This paper investigates the effectiveness of different pretraining strategies for dialogue response selection, revealing that RoBERTa initialization and MLM+NSP tasks outperform previous methods, with NSP being particularly crucial.

Contribution

The study challenges assumptions about BERT and dialogue-specific tasks, demonstrating that RoBERTa and MLM+NSP are more effective for DRS, and introduces a new state-of-the-art on the Ubuntu dataset.

Findings

01

RoBERTa matches BERT in DRS performance

02

MLM+NSP outperforms other TAP tasks

03

NSP is essential for effective dialogue response selection

Abstract

Recent advancements in dialogue response selection (DRS) are based on the \textit{task-adaptive pre-training (TAP)} approach, by first initializing their model with BERT~\cite{devlin-etal-2019-bert}, and adapt to dialogue data with dialogue-specific or fine-grained pre-training tasks. However, it is uncertain whether BERT is the best initialization choice, or whether the proposed dialogue-specific fine-grained learning tasks are actually better than MLM+NSP. This paper aims to verify assumptions made in previous works and understand the source of improvements for DRS. We show that initializing with RoBERTa achieve similar performance as BERT, and MLM+NSP can outperform all previously proposed TAP tasks, during which we also contribute a new state-of-the-art on the Ubuntu corpus. Additional analyses shows that the main source of improvements comes from the TAP step, and that the NSP task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Speech Recognition and Synthesis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Layer Normalization · Residual Connection · Dropout · Weight Decay · Softmax · Linear Warmup With Linear Decay · Attention Dropout