RealPatch: A Statistical Matching Framework for Model Patching with Real Samples
Sara Romiti, Christopher Inskip, Viktoriia Sharmanska, Novi Quadrianto

TL;DR
RealPatch is a data augmentation framework that uses real samples and statistical matching to improve subgroup performance and fairness in machine learning classifiers, offering a simpler and more efficient alternative to generative model-based patching.
Contribution
This work introduces RealPatch, a novel data augmentation method based on statistical matching that enhances model patching efficiency and effectiveness without relying on generative adversarial networks.
Findings
Improves worst-case subgroup performance across datasets.
Reduces subgroup performance gap in binary classification.
Effectively eliminates dataset leakage in multi-class settings.
Abstract
Machine learning classifiers are typically trained to minimise the average error across a dataset. Unfortunately, in practice, this process often exploits spurious correlations caused by subgroup imbalance within the training data, resulting in high average performance but highly variable performance across subgroups. Recent work to address this problem proposes model patching with CAMEL. This previous approach uses generative adversarial networks to perform intra-class inter-subgroup data augmentations, requiring (a) the training of a number of computationally expensive models and (b) sufficient quality of model's synthetic outputs for the given domain. In this work, we propose RealPatch, a framework for simpler, faster, and more data-efficient data augmentation based on statistical matching. Our framework performs model patching by augmenting a dataset with real samples, mitigating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning and Data Classification · Advanced Neural Network Applications
