Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection
Kodjo Mawuena Amekoe, Mustapha Lebbah, Gregoire Jaffre, Hanene Azzag,, Zaineb Chelly Dagdia

TL;DR
This study empirically compares instance incremental and batch learning methods in delayed label environments for streaming tabular data, revealing that batch methods often outperform incremental ones in predictive accuracy and interpretability.
Contribution
It provides the first comprehensive empirical evaluation of incremental versus batch learning in delayed label streaming scenarios using real-world fraud detection data.
Findings
Batch learning outperforms incremental methods in predictive accuracy.
Batch methods are more interpretable than incremental models.
Incremental learning is not always the best choice in delayed label settings.
Abstract
Real-world tabular learning production scenarios typically involve evolving data streams, where data arrives continuously and its distribution may change over time. In such a setting, most studies in the literature regarding supervised learning favor the use of instance incremental algorithms due to their ability to adapt to changes in the data distribution. Another significant reason for choosing these algorithms is \textit{avoid storing observations in memory} as commonly done in batch incremental settings. However, the design of instance incremental algorithms often assumes immediate availability of labels, which is an optimistic assumption. In many real-world scenarios, such as fraud detection or credit scoring, labels may be delayed. Consequently, batch incremental algorithms are widely used in many real-world tasks. This raises an important question: "In delayed settings, is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
