Crowdsourcing Diverse Paraphrases for Training Task-oriented Bots
Jorge Ram\'irez, Auday Berro, Marcos Baez, Boualem Benatallah, Fabio, Casati

TL;DR
This paper proposes a novel crowdsourcing method to generate syntactically diverse paraphrases, enhancing dataset quality for training task-oriented conversational bots.
Contribution
It introduces a guiding approach to crowd-based paraphrasing that emphasizes syntactic diversity, addressing a gap in existing lexical-focused methods.
Findings
Effective guidance increases syntactic diversity of paraphrases
Improved dataset quality for task-oriented bots
Potential for broader application in NLP data collection
Abstract
A prominent approach to build datasets for training task-oriented bots is crowd-based paraphrasing. Current approaches, however, assume the crowd would naturally provide diverse paraphrases or focus only on lexical diversity. In this WiP we addressed an overlooked aspect of diversity, introducing an approach for guiding the crowdsourcing process towards paraphrases that are syntactically diverse.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Spam and Phishing Detection · Blood donation and transfusion practices
