Crowdsourcing Diverse Paraphrases for Training Task-oriented Bots

Jorge Ram\'irez; Auday Berro; Marcos Baez; Boualem Benatallah; Fabio; Casati

arXiv:2109.09420·cs.CL·September 21, 2021

Crowdsourcing Diverse Paraphrases for Training Task-oriented Bots

Jorge Ram\'irez, Auday Berro, Marcos Baez, Boualem Benatallah, Fabio, Casati

PDF

Open Access

TL;DR

This paper proposes a novel crowdsourcing method to generate syntactically diverse paraphrases, enhancing dataset quality for training task-oriented conversational bots.

Contribution

It introduces a guiding approach to crowd-based paraphrasing that emphasizes syntactic diversity, addressing a gap in existing lexical-focused methods.

Findings

01

Effective guidance increases syntactic diversity of paraphrases

02

Improved dataset quality for task-oriented bots

03

Potential for broader application in NLP data collection

Abstract

A prominent approach to build datasets for training task-oriented bots is crowd-based paraphrasing. Current approaches, however, assume the crowd would naturally provide diverse paraphrases or focus only on lexical diversity. In this WiP we addressed an overlooked aspect of diversity, introducing an approach for guiding the crowdsourcing process towards paraphrases that are syntactically diverse.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Spam and Phishing Detection · Blood donation and transfusion practices