Data Augmentation via Causal-Residual Bootstrapping

Mateusz Gajewski; Sophia Xiao; Bijan Mazaheri

arXiv:2603.15335·cs.LG·March 17, 2026

Data Augmentation via Causal-Residual Bootstrapping

Mateusz Gajewski, Sophia Xiao, Bijan Mazaheri

PDF

Open Access

TL;DR

This paper introduces a novel data augmentation method based on causal-residual bootstrapping that leverages causal structure and residual permutation to enhance model accuracy, supported by theoretical and empirical evidence.

Contribution

It proposes a new augmentation technique using residual permutation informed by causal structure, extending beyond Markov equivalence classes, with theoretical backing and improved predictive performance.

Findings

01

Enhanced model accuracy with the proposed augmentation method

02

Theoretical validation in linear Gaussian models

03

Empirical results show improved predictive performance

Abstract

Data augmentation integrates domain knowledge into a dataset by making domain-informed modifications to existing data points. For example, image data can be augmented by duplicating images in different tints or orientations, thereby incorporating the knowledge that images may vary in these dimensions. Recent work by Teshima and Sugiyama has explored the integration of causal knowledge (e.g, A causes B causes C) up to conditional independence equivalence. We suggest a related approach for settings with additive noise that can incorporate information beyond a Markov equivalence class. The approach, built on the principle of independent mechanisms, permutes the residuals of models built on marginal probability distributions. Predictive models built on our augmented data demonstrate improved accuracy, for which we provide theoretical backing in linear Gaussian settings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning