Bias Challenges in Counterfactual Data Augmentation

S Chandra Mouli; Yangze Zhou; Bruno Ribeiro

arXiv:2209.05104·cs.LG·September 15, 2022·1 cites

Bias Challenges in Counterfactual Data Augmentation

S Chandra Mouli, Yangze Zhou, Bruno Ribeiro

PDF

Open Access

TL;DR

This paper investigates the limitations of counterfactual data augmentation in achieving out-of-distribution robustness, revealing that context-guessing mechanisms may hinder the desired invariance and robustness in NLP tasks.

Contribution

It provides a theoretical analysis of counterfactual augmentation limitations and demonstrates a specific NLP example where such methods fail to improve robustness.

Findings

01

Counterfactual augmentation may not ensure invariance when using context-guessing machines.

02

Theoretical analysis highlights conditions under which augmentation fails.

03

Empirical example in NLP shows limited robustness gains.

Abstract

Deep learning models tend not to be out-of-distribution robust primarily due to their reliance on spurious features to solve the task. Counterfactual data augmentations provide a general way of (approximately) achieving representations that are counterfactual-invariant to spurious features, a requirement for out-of-distribution (OOD) robustness. In this work, we show that counterfactual data augmentations may not achieve the desired counterfactual-invariance if the augmentation is performed by a context-guessing machine, an abstract machine that guesses the most-likely context of a given input. We theoretically analyze the invariance imposed by such counterfactual data augmentations and describe an exemplar NLP task where counterfactual data augmentation by a context-guessing machine does not lead to robust OOD classifiers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications