Identify ambiguous tasks combining crowdsourced labels by weighting Areas Under the Margin
Tanguy Lefort, Benjamin Charlier, Alexis Joly, Joseph Salmon

TL;DR
This paper introduces WAUM, a novel method to identify and discard ambiguous tasks in crowdsourced datasets, improving model generalization by adapting the Area Under the Margin metric.
Contribution
The paper adapts the Area Under the Margin to crowdsourced learning, creating WAUM, which effectively detects ambiguous tasks to enhance training outcomes.
Findings
WAUM outperforms existing strategies in identifying ambiguous tasks.
Discarding ambiguous tasks improves generalization on multiple datasets.
Method shows robustness in both simulated and real crowdsourced datasets.
Abstract
In supervised learning - for instance in image classification - modern massive datasets are commonly labeled by a crowd of workers. The obtained labels in this crowdsourcing setting are then aggregated for training, generally leveraging a per-worker trust score. Yet, such workers oriented approaches discard the tasks' ambiguity. Ambiguous tasks might fool expert workers, which is often harmful for the learning step. In standard supervised learning settings - with one label per task - the Area Under the Margin (AUM) was tailored to identify mislabeled data. We adapt the AUM to identify ambiguous tasks in crowdsourced learning scenarios, introducing the Weighted Areas Under the Margin (WAUM). The WAUM is an average of AUMs weighted according to task-dependent scores. We show that the WAUM can help discarding ambiguous tasks from the training set, leading to better generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Mobile Crowdsensing and Crowdsourcing
