BotsTalk: Machine-sourced Framework for Automatic Curation of   Large-scale Multi-skill Dialogue Datasets

Minju Kim; Chaehyeong Kim; Yongho Song; Seung-won Hwang; Jinyoung Yeo

arXiv:2210.12687·cs.CL·October 25, 2022·1 cites

BotsTalk: Machine-sourced Framework for Automatic Curation of Large-scale Multi-skill Dialogue Datasets

Minju Kim, Chaehyeong Kim, Yongho Song, Seung-won Hwang, Jinyoung Yeo

PDF

Open Access 1 Repo

TL;DR

BotsTalk introduces a novel multi-agent framework for automatically annotating large-scale multi-skill dialogue datasets, enabling the development of more versatile open-domain chatbots.

Contribution

The paper presents BotsTalk and BSBT, a new framework and dataset for multi-skill dialogue modeling, with extensive experiments demonstrating their effectiveness.

Findings

01

BSBT contains 300K conversations for multi-skill dialogue training.

02

The dataset improves understanding of skill blending and grounding in dialogue systems.

03

BotsTalk enables automatic annotation of complex multi-skill dialogues.

Abstract

To build open-domain chatbots that are able to use diverse communicative skills, we propose a novel framework BotsTalk, where multiple agents grounded to the specific target skills participate in a conversation to automatically annotate multi-skill dialogues. We further present Blended Skill BotsTalk (BSBT), a large-scale multi-skill dialogue dataset comprising 300K conversations. Through extensive experiments, we demonstrate that our dataset can be effective for multi-skill dialogue systems which require an understanding of skill blending as well as skill grounding. Our code and data are available at https://github.com/convei-lab/BotsTalk.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

convei-lab/botstalk
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques