Active learning for data streams: a survey
Davide Cacciarelli, Murat Kulahci

TL;DR
This survey reviews recent online active learning methods for data streams, highlighting their techniques, strengths, limitations, and the challenges faced in real-time data annotation to improve machine learning performance.
Contribution
It provides a comprehensive overview of recent stream-based active learning approaches, comparing their strategies and discussing future research directions.
Findings
Various techniques for online data selection are discussed.
Strengths and limitations of current approaches are analyzed.
Challenges and opportunities in real-time active learning are identified.
Abstract
Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Algorithms · Machine Learning and Data Classification
