Improve Learning from Crowds via Generative Augmentation

Zhendong Chu; Hongning Wang

arXiv:2107.10449·cs.LG·July 23, 2021

Improve Learning from Crowds via Generative Augmentation

Zhendong Chu, Hongning Wang

PDF

TL;DR

This paper introduces a generative data augmentation method using GANs to improve learning from sparse crowdsourced annotations, enhancing model quality in low-budget scenarios.

Contribution

It proposes a novel GAN-based augmentation framework that enforces realistic and informative annotations, addressing sparsity in crowdsourced data for better machine learning performance.

Findings

01

Outperforms state-of-the-art crowdsourcing learning methods

02

Effective in low-budget crowdsourcing scenarios

03

Generates annotations that follow authentic data distribution

Abstract

Crowdsourcing provides an efficient label collection schema for supervised machine learning. However, to control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators. This creates a sparsity issue and limits the quality of machine learning models trained on such data. In this paper, we study how to handle sparsity in crowdsourced data using data augmentation. Specifically, we propose to directly learn a classifier by augmenting the raw sparse annotations. We implement two principles of high-quality augmentation using Generative Adversarial Networks: 1) the generated annotations should follow the distribution of authentic ones, which is measured by a discriminator; 2) the generated annotations should have high mutual information with the ground-truth labels, which is measured by an auxiliary network. Extensive experiments and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.