Investigating the Impact of Balancing, Filtering, and Complexity on Predictive Multiplicity: A Data-Centric Perspective
Mustafa Cavus, Przemyslaw Biecek

TL;DR
This paper explores how data preprocessing techniques such as balancing and filtering influence predictive multiplicity and model stability, emphasizing the importance of data-centric approaches in high-stakes predictive modeling.
Contribution
It provides an empirical analysis of the effects of balancing and filtering methods on predictive multiplicity across diverse real-world datasets, considering data complexity.
Findings
Balancing techniques can increase predictive multiplicity in complex datasets.
Filtering methods reduce redundancy and improve model generalization.
Data complexity influences the impact of preprocessing on model stability.
Abstract
The Rashomon effect presents a significant challenge in model selection. It occurs when multiple models achieve similar performance on a dataset but produce different predictions, resulting in predictive multiplicity. This is especially problematic in high-stakes environments, where arbitrary model outcomes can have serious consequences. Traditional model selection methods prioritize accuracy and fail to address this issue. Factors such as class imbalance and irrelevant variables further complicate the situation, making it harder for models to provide trustworthy predictions. Data-centric AI approaches can mitigate these problems by prioritizing data optimization, particularly through preprocessing techniques. However, recent studies suggest preprocessing methods may inadvertently inflate predictive multiplicity. This paper investigates how data preprocessing techniques like balancing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications · Advanced Statistical Methods and Models · Mental Health Research Topics
