Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained   Conversational Representations

Sam Coope; Tyler Farghly; Daniela Gerz; Ivan Vuli\'c; Matthew; Henderson

arXiv:2005.08866·cs.CL·July 17, 2020

Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

Sam Coope, Tyler Farghly, Daniela Gerz, Ivan Vuli\'c, Matthew, Henderson

PDF

1 Repo

TL;DR

Span-ConveRT is a lightweight dialog slot-filling model that uses span extraction and pretrained conversational models, excelling in few-shot learning scenarios, and is supported by a new restaurant booking dataset.

Contribution

The paper introduces Span-ConveRT, a novel span extraction approach for dialog slot-filling that effectively leverages pretrained conversational models for few-shot learning.

Findings

01

Span-ConveRT outperforms models trained from scratch and BERT-based span extractors in few-shot settings.

02

Leveraging pretrained conversational knowledge improves slot-filling performance.

03

RESTAURANTS-8K dataset provides a new benchmark for dialog span extraction in restaurant booking.

Abstract

We introduce Span-ConveRT, a light-weight model for dialog slot-filling which frames the task as a turn-based span extraction task. This formulation allows for a simple integration of conversational knowledge coded in large pretrained conversational models such as ConveRT (Henderson et al., 2019). We show that leveraging such knowledge in Span-ConveRT is especially useful for few-shot learning scenarios: we report consistent gains over 1) a span extractor that trains representations from scratch in the target domain, and 2) a BERT-based span extractor. In order to inspire more work on span extraction for the slot-filling task, we also release RESTAURANTS-8K, a new challenging data set of 8,198 utterances, compiled from actual conversations in the restaurant booking domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PolyAI-LDN/task-specific-datasets
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.