Improving Online Bagging for Complex Imbalanced Data Stream
Bartosz Przybyl, Jerzy Stefanowski

TL;DR
This paper introduces enhanced online bagging methods that better handle complex imbalanced data streams by considering local difficulty factors, leading to improved classifier performance in challenging scenarios.
Contribution
It proposes Neighbourhood Undersampling and Oversampling extensions to online bagging that address local minority class complexities and unsafe examples.
Findings
Enhanced online bagging variants outperform previous methods.
The methods effectively handle local minority class difficulties.
Experimental results show improved accuracy on synthetic complex data streams.
Abstract
Learning classifiers from imbalanced and concept drifting data streams is still a challenge. Most of the current proposals focus on taking into account changes in the global imbalance ratio only and ignore the local difficulty factors, such as the minority class decomposition into sub-concepts and the presence of unsafe types of examples (borderline or rare ones). As the above factors present in the stream may deteriorate the performance of popular online classifiers, we propose extensions of resampling online bagging, namely Neighbourhood Undersampling or Oversampling Online Bagging to take better account of the presence of unsafe minority examples. The performed computational experiments with synthetic complex imbalanced data streams have shown their advantage over earlier variants of online bagging resampling ensembles.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Cloud Computing and Resource Management · Advanced Data Processing Techniques
MethodsFocus
