ConvFiT: Conversational Fine-Tuning of Pretrained Language Models

Ivan Vuli\'c; Pei-Hao Su; Sam Coope; Daniela Gerz; Pawe{\l}; Budzianowski; I\~nigo Casanueva; Nikola Mrk\v{s}i\'c; Tsung-Hsien Wen

arXiv:2109.10126·cs.CL·September 22, 2021·1 cites

ConvFiT: Conversational Fine-Tuning of Pretrained Language Models

Ivan Vuli\'c, Pei-Hao Su, Sam Coope, Daniela Gerz, Pawe{\l}, Budzianowski, I\~nigo Casanueva, Nikola Mrk\v{s}i\'c, Tsung-Hsien Wen

PDF

Open Access

TL;DR

ConvFiT is a two-stage method that efficiently transforms pretrained language models into effective conversational and task-specific sentence encoders, achieving state-of-the-art intent detection performance with minimal additional data.

Contribution

The paper introduces ConvFiT, a simple two-stage process that converts pretrained LMs into versatile conversational and task-specific encoders without extensive pretraining.

Findings

01

Achieves state-of-the-art intent detection results.

02

Effective in few-shot learning scenarios.

03

Requires less data than traditional conversational pretraining.

Abstract

Transformer-based language models (LMs) pretrained on large text collections are proven to store a wealth of semantic knowledge. However, 1) they are not effective as sentence encoders when used off-the-shelf, and 2) thus typically lag behind conversationally pretrained (e.g., via response selection) encoders on conversational tasks such as intent detection (ID). In this work, we propose ConvFiT, a simple and efficient two-stage procedure which turns any pretrained LM into a universal conversational encoder (after Stage 1 ConvFiT-ing) and task-specialised sentence encoder (after Stage 2). We demonstrate that 1) full-blown conversational pretraining is not required, and that LMs can be quickly transformed into effective conversational encoders with much smaller amounts of unannotated data; 2) pretrained LMs can be fine-tuned into task-specialised sentence encoders, optimised for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications