Counterfactual Maximum Likelihood Estimation for Training Deep Networks
Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang

TL;DR
This paper introduces a causality-based training framework called Counterfactual Maximum Likelihood Estimation (CMLE) to reduce spurious correlations in deep networks, improving out-of-domain generalization and causal inference.
Contribution
It proposes a novel CMLE approach based on structural causal models, with algorithms for causal training using observational data, and demonstrates improved robustness in NLP and image captioning tasks.
Findings
CMLE outperforms regular MLE in out-of-domain tests
CMLE reduces spurious correlations effectively
Methods maintain comparable in-domain performance
Abstract
Although deep learning models have driven state-of-the-art performance on a wide array of tasks, they are prone to spurious correlations that should not be learned as predictive clues. To mitigate this problem, we propose a causality-based training framework to reduce the spurious correlations caused by observed confounders. We give theoretical analysis on the underlying general Structural Causal Model (SCM) and propose to perform Maximum Likelihood Estimation (MLE) on the interventional distribution instead of the observational distribution, namely Counterfactual Maximum Likelihood Estimation (CMLE). As the interventional distribution, in general, is hidden from the observational data, we then derive two different upper bounds of the expected negative log-likelihood and propose two general algorithms, Implicit CMLE and Explicit CMLE, for causal predictions of deep learning models using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
