Counterfactual Maximum Likelihood Estimation for Training Deep Networks

Xinyi Wang; Wenhu Chen; Michael Saxon; William Yang Wang

arXiv:2106.03831·cs.LG·October 27, 2021

Counterfactual Maximum Likelihood Estimation for Training Deep Networks

Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a causality-based training framework called Counterfactual Maximum Likelihood Estimation (CMLE) to reduce spurious correlations in deep networks, improving out-of-domain generalization and causal inference.

Contribution

It proposes a novel CMLE approach based on structural causal models, with algorithms for causal training using observational data, and demonstrates improved robustness in NLP and image captioning tasks.

Findings

01

CMLE outperforms regular MLE in out-of-domain tests

02

CMLE reduces spurious correlations effectively

03

Methods maintain comparable in-domain performance

Abstract

Although deep learning models have driven state-of-the-art performance on a wide array of tasks, they are prone to spurious correlations that should not be learned as predictive clues. To mitigate this problem, we propose a causality-based training framework to reduce the spurious correlations caused by observed confounders. We give theoretical analysis on the underlying general Structural Causal Model (SCM) and propose to perform Maximum Likelihood Estimation (MLE) on the interventional distribution instead of the observational distribution, namely Counterfactual Maximum Likelihood Estimation (CMLE). As the interventional distribution, in general, is hidden from the observational data, we then derive two different upper bounds of the expected negative log-likelihood and propose two general algorithms, Implicit CMLE and Explicit CMLE, for causal predictions of deep learning models using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WANGXinyiLinda/CMLE
jaxOfficial

Videos

Counterfactual Maximum Likelihood Estimation for Training Deep Networks· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning