# Understanding Task Design Trade-offs in Crowdsourced Paraphrase   Collection

**Authors:** Youxuan Jiang, Jonathan K. Kummerfeld, Walter S. Lasecki

arXiv: 1704.05753 · 2020-06-05

## TL;DR

This paper systematically analyzes how task design choices in crowdsourced paraphrase collection affect the quality and diversity of generated data, offering insights to optimize dataset creation.

## Contribution

It presents the first comprehensive study of task design trade-offs in crowdsourcing paraphrase collection, examining instructions, incentives, and workflows.

## Key findings

- Trade-offs between accuracy and diversity in crowd responses
- Impact of instructions and incentives on paraphrase quality
- Guidelines for designing effective crowdsourcing tasks

## Abstract

Linguistically diverse datasets are critical for training and evaluating robust machine learning systems, but data collection is a costly process that often requires experts. Crowdsourcing the process of paraphrase generation is an effective means of expanding natural language datasets, but there has been limited analysis of the trade-offs that arise when designing tasks. In this paper, we present the first systematic study of the key factors in crowdsourcing paraphrase collection. We consider variations in instructions, incentives, data domains, and workflows. We manually analyzed paraphrases for correctness, grammaticality, and linguistic diversity. Our observations provide new insight into the trade-offs between accuracy and diversity in crowd responses that arise as a result of task design, providing guidance for future paraphrase generation procedures.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.05753/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1704.05753/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1704.05753/full.md

---
Source: https://tomesphere.com/paper/1704.05753