Privacy-Aware Crowd Labelling for Machine Learning Tasks

Giannis Haralabopoulos; Ioannis Anagnostopoulos

arXiv:2203.01373·cs.HC·March 4, 2022

Privacy-Aware Crowd Labelling for Machine Learning Tasks

Giannis Haralabopoulos, Ioannis Anagnostopoulos

PDF

Open Access

TL;DR

This paper introduces a privacy-preserving text labelling method for crowdsourcing in machine learning, aiming to protect user privacy while maintaining label quality and diversity.

Contribution

It proposes a novel text transformation approach that balances privacy preservation with label correlation and consistency in crowdsourced annotations.

Findings

01

Privacy transformations retain label correlation

02

Transformations preserve annotation diversity

03

Method effectively balances privacy and label quality

Abstract

The extensive use of online social media has highlighted the importance of privacy in the digital space. As more scientists analyse the data created in these platforms, privacy concerns have extended to data usage within the academia. Although text analysis is a well documented topic in academic literature with a multitude of applications, ensuring privacy of user-generated content has been overlooked. Most sentiment analysis methods require emotion labels, which can be obtained through crowdsourcing, where non-expert individuals contribute to scientific tasks. The text itself has to be exposed to third parties in order to be labelled. In an effort to reduce the exposure of online users' information, we propose a privacy preserving text labelling method for varying applications, based in crowdsourcing. We transform text with different levels of privacy, and analyse the effectiveness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy, Security, and Data Protection · Mobile Crowdsensing and Crowdsourcing · Hate Speech and Cyberbullying Detection