Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data
Rune D. Kj{\ae}rsgaard, Manja G. Gr{\o}nberg, Line K. H. Clemmensen

TL;DR
This paper explores sampling techniques to address data imbalance in production datasets, aiming to enhance model predictions for underrepresented observations, especially in biopharmaceutical manufacturing contexts.
Contribution
It introduces and evaluates three sampling methods to improve predictive accuracy for underrepresented data points in imbalanced datasets.
Findings
Sampling improves predictions for underrepresented observations.
Small overall performance reduction with better fairness.
Highlights need for balanced model evaluation.
Abstract
Data imbalance is common in production data, where controlled production settings require data to fall within a narrow range of variation and data are collected with quality assessment in mind, rather than data analytic insights. This imbalance negatively impacts the predictive performance of models on underrepresented observations. We propose sampling to adjust for this imbalance with the goal of improving the performance of models trained on historical production data. We investigate the use of three sampling approaches to adjust for imbalance. The goal is to downsample the covariates in the training data and subsequently fit a regression model. We investigate how the predictive power of the model changes when using either the sampled or the original data for training. We apply our methods on a large biopharmaceutical manufacturing data set from an advanced simulation of penicillin…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Advanced Statistical Process Monitoring · Machine Learning and Data Classification
