Close to Reality: Interpretable and Feasible Data Augmentation for Imbalanced Learning

Matheus Camilo da Silva; Gabriel Gustavo Costanzo; Andrea de Lorenzo; Sylvio Barbon Junior

arXiv:2603.13927·cs.LG·March 17, 2026

Close to Reality: Interpretable and Feasible Data Augmentation for Imbalanced Learning

Matheus Camilo da Silva, Gabriel Gustavo Costanzo, Andrea de Lorenzo, Sylvio Barbon Junior

PDF

Open Access

TL;DR

This paper introduces DPG-da, an interpretable data augmentation framework for imbalanced learning that generates diverse, valid, and explainable samples by leveraging decision predicate graphs derived from trained models.

Contribution

The paper presents a novel, interpretable data augmentation method using decision predicate graphs to improve class imbalance handling.

Findings

01

DPG-da outperforms traditional over-sampling methods on benchmark datasets.

02

DPG-da ensures generated samples are diverse, valid, and interpretable.

03

The framework provides clear explanations for the augmented data.

Abstract

Many machine learning classification tasks involve imbalanced datasets, which are often subject to over-sampling techniques aimed at improving model performance. However, these techniques are prone to generating unrealistic or infeasible samples. Furthermore, they often function as black boxes, lacking interpretability in their procedures. This opacity makes it difficult to track their effectiveness and provide necessary adjustments, and they may ultimately fail to yield significant performance improvements. To bridge this gap, we introduce the Decision Predicate Graphs for Data Augmentation (DPG-da), a framework that extracts interpretable decision predicates from trained models to capture domain rules and enforce them during sample generation. This design ensures that over-sampled data remain diverse, constraint-satisfying, and interpretable. In experiments on synthetic and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare