Lessons Learned from a Citizen Science Project for Natural Language Processing
Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe, G\"ozde G\"ul \c{S}ahin,, Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart de Castilho,, Iryna Gurevych

TL;DR
This paper explores using Citizen Science for NLP annotation tasks, demonstrating it can produce high-quality data with motivated volunteers, while highlighting challenges like scalability and ethical considerations.
Contribution
It provides an exploratory study on applying Citizen Science to NLP annotation, offering guidelines, lessons learned, and resources for future research.
Findings
High-quality annotations achievable with motivated volunteers
Citizen Science can complement traditional crowdsourcing in NLP
Challenges include scalability and ethical issues
Abstract
Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often outsourced to paid crowdworkers. Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP. To investigate whether and how well Citizen Science can be applied in this setting, we conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset. Our results show that this can yield high-quality annotations and attract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues. We summarize lessons learned in the form of guidelines and provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpecies Distribution and Climate Change · Mobile Crowdsensing and Crowdsourcing · Data-Driven Disease Surveillance
