Learning from Crowds with Sparse and Imbalanced Annotations
Ye Shi, Shao-Yuan Li, Sheng-Jun Huang

TL;DR
This paper introduces Self-Crowd, a self-training method that addresses class imbalance and sparse annotations in crowdsourced labeling, improving learning performance by rebalancing annotations through confidence-based pseudo-labeling.
Contribution
The paper proposes a novel self-training approach with distribution-aware confidence measures to mitigate class imbalance in sparse crowdsourced annotations.
Findings
Self-Crowd produces more balanced annotations during training.
It significantly improves classification performance across various sparsity levels.
The method effectively reduces bias caused by skewed annotation distributions.
Abstract
Traditional supervised learning requires ground truth labels for the training data, whose collection can be difficult in many cases. Recently, crowdsourcing has established itself as an efficient labeling solution through resorting to non-expert crowds. To reduce the labeling error effects, one common practice is to distribute each instance to multiple workers, whereas each worker only annotates a subset of data, resulting in the {\it sparse annotation} phenomenon. In this paper, we note that when meeting with class-imbalance, i.e., when the ground truth labels are {\it class-imbalanced}, the sparse annotations are prone to be skewly distributed, which thus can severely bias the learning algorithm. To combat this issue, we propose one self-training based approach named {\it Self-Crowd} by progressively adding confident pseudo-annotations and rebalancing the annotation distribution.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Imbalanced Data Classification Techniques · COVID-19 diagnosis using AI
MethodsAttentive Walk-Aggregating Graph Neural Network
