Active Learning for Noisy Data Streams Using Weak and Strong Labelers
Taraneh Younesian, Dick Epema, Lydia Y. Chen

TL;DR
This paper introduces an online active learning algorithm for noisy data streams that effectively utilizes weak and strong labelers to reduce labeling costs while maintaining high accuracy in image classification tasks.
Contribution
It proposes a novel active learning method that combines filtering, diversity, informativeness, and labeler selection to handle noisy labels with limited budgets.
Findings
Maintains high accuracy with reduced labeling costs.
Effectively filters noisy samples and selects appropriate labelers.
Performs well on CIFAR10 and CIFAR100 datasets with up to 60% noise.
Abstract
Labeling data correctly is an expensive and challenging task in machine learning, especially for on-line data streams. Deep learning models especially require a large number of clean labeled data that is very difficult to acquire in real-world problems. Choosing useful data samples to label while minimizing the cost of labeling is crucial to maintain efficiency in the training process. When confronted with multiple labelers with different expertise and respective labeling costs, deciding which labeler to choose is nontrivial. In this paper, we consider a novel weak and strong labeler problem inspired by humans natural ability for labeling, in the presence of data streams with noisy labels and constrained by a limited budget. We propose an on-line active learning algorithm that consists of four steps: filtering, adding diversity, informative sample selection, and labeler selection. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Data Stream Mining Techniques
