Fitting Multiple Machine Learning Models with Performance Based Clustering
Mehmet Efe Lorasdagi, Ahmet Berker Koc, Ali Taha Koc and, Suleyman Serdar Kozat

TL;DR
This paper proposes a clustering-based framework that groups data by feature-target relations to fit multiple models, improving performance on complex, real-world datasets, including streaming data scenarios.
Contribution
It introduces a novel clustering method that relaxes the single mechanism assumption, enabling multiple models to better capture data heterogeneity, especially in streaming contexts.
Findings
Significant performance improvements over traditional single-model methods.
Effective handling of streaming data with adaptive ensemble weights.
Validated on real-world datasets with diverse data distributions.
Abstract
Traditional machine learning approaches assume that data comes from a single generating mechanism, which may not hold for most real life data. In these cases, the single mechanism assumption can result in suboptimal performance. We introduce a clustering framework that eliminates this assumption by grouping the data according to the relations between the features and the target values and we obtain multiple separate models to learn different parts of the data. We further extend our framework to applications having streaming data where we produce outcomes using an ensemble of models. For this, the ensemble weights are updated based on the incoming data batches. We demonstrate the performance of our approach over the widely-studied real life datasets, showing significant improvements over the traditional single-model approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Mining Algorithms and Applications
