A Guide for Practical Use of ADMG Causal Data Augmentation
Audrey Poinsot, Alessandro Leite

TL;DR
This paper evaluates the ADMG causal data augmentation method for tabular data, highlighting its strengths and limitations in small-data regimes and providing insights for effective application.
Contribution
It offers an experimental analysis of ADMG augmentation, clarifying when prior causal knowledge improves data generation and model robustness.
Findings
ADMG is model-agnostic and independent of data mechanism.
Requires a minimal number of observations, which can be challenging in small-data settings.
Propagates outliers, degrading model performance.
Abstract
Data augmentation is essential when applying Machine Learning in small-data regimes. It generates new samples following the observed data distribution while increasing their diversity and variability to help researchers and practitioners improve their models' robustness and, thus, deploy them in the real world. Nevertheless, its usage in tabular data still needs to be improved, as prior knowledge about the underlying data mechanism is seldom considered, limiting the fidelity and diversity of the generated data. Causal data augmentation strategies have been pointed out as a solution to handle these challenges by relying on conditional independence encoded in a causal graph. In this context, this paper experimentally analyzed the ADMG causal augmentation method considering different settings to support researchers and practitioners in understanding under which conditions prior knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Distributed Sensor Networks and Detection Algorithms · Advanced Causal Inference Techniques
