Active Learning from Crowd in Document Screening

Evgeny Krivosheev; Burcu Sayin; Alessandro Bozzon; Zolt\'an Szl\'avik

arXiv:2012.02297·cs.IR·December 7, 2020

Active Learning from Crowd in Document Screening

Evgeny Krivosheev, Burcu Sayin, Alessandro Bozzon, Zolt\'an Szl\'avik

PDF

Open Access

TL;DR

This paper presents a novel active learning approach that efficiently combines crowdsourcing and machine learning for document screening, optimizing the use of limited budgets to improve classification accuracy.

Contribution

It introduces objective-aware sampling, a new active learning technique that prioritizes minimizing overall classification errors in multi-label document screening tasks.

Findings

01

Objective-aware sampling outperforms existing active learning strategies.

02

The method effectively allocates labeling resources to improve classifier performance.

03

Significant reduction in classification errors achieved with limited budgets.

Abstract

In this paper, we explore how to efficiently combine crowdsourcing and machine intelligence for the problem of document screening, where we need to screen documents with a set of machine-learning filters. Specifically, we focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently. It is a challenging task since the budget is limited and there are countless number of ways to spend the given budget on the problem. We propose a multi-label active learning screening specific sampling technique -- objective-aware sampling -- for querying unlabelled documents for annotating. Our algorithm takes a decision on which machine filter need more training data and how to choose unlabeled items to annotate in order to minimize the risk of overall classification errors rather than minimizing a single filter error. We demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Mobile Crowdsensing and Crowdsourcing · Imbalanced Data Classification Techniques