SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams
Chun Wai Chiu, Leandro L. Minku

TL;DR
SMOClust introduces a drift-adaptive oversampling method using stream clustering to generate synthetic minority class examples, effectively addressing class imbalance and concept drift in evolving data streams.
Contribution
The paper presents a novel oversampling strategy based on stream clustering that adapts to concept drift and captures data difficulty factors without explicit memory caching.
Findings
Outperforms existing methods in handling concept drift and class imbalance.
Effective in scenarios with high proportions of safe and borderline minority examples.
Works well on both artificial and real-world data streams.
Abstract
Many real-world data stream applications not only suffer from concept drift but also class imbalance. Yet, very few existing studies investigated this joint challenge. Data difficulty factors, which have been shown to be key challenges in class imbalanced data streams, are not taken into account by existing approaches when learning class imbalanced data streams. In this work, we propose a drift adaptable oversampling strategy to synthesise minority class examples based on stream clustering. The motivation is that stream clustering methods continuously update themselves to reflect the characteristics of the current underlying concept, including data difficulty factors. This nature can potentially be used to compress past information without caching data in the memory explicitly. Based on the compressed information, synthetic examples can be created within the region that recently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Imbalanced Data Classification Techniques · Caching and Content Delivery
