OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams
Yiqun Diao, Yutong Yang, Qinbin Li, Bingsheng He, Mian Lu

TL;DR
This paper introduces OEBench, a benchmark for evaluating incremental learning algorithms on real-world relational data streams, revealing significant challenges posed by open environment factors like drifts and anomalies.
Contribution
The study develops a comprehensive benchmark with real-world data, highlighting the widespread nature of open environment challenges and evaluating existing algorithms' performance.
Findings
Open environment scenarios are common in real-world data streams.
Increased data quantity does not always improve model accuracy.
Current techniques are insufficient to handle open environment challenges.
Abstract
How to get insights from relational data streams in a timely manner is a hot research topic. Data streams can present unique challenges, such as distribution drifts, outliers, emerging classes, and changing features, which have recently been described as open environment challenges for machine learning. While existing studies have been done on incremental learning for data streams, their evaluations are mostly conducted with synthetic datasets. Thus, a natural question is how those open environment challenges look like and how existing incremental learning algorithms perform on real-world relational data streams. To fill this gap, we develop an Open Environment Benchmark named OEBench to evaluate open environment challenges in real-world relational data streams. Specifically, we investigate 55 real-world relational data streams and establish that open environment scenarios are indeed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Time Series Analysis and Forecasting · Machine Learning and Data Classification
