A Causal Ordering Prior for Unsupervised Representation Learning

Avinash Kori; Pedro Sanchez; Konstantinos Vilouras; Ben Glocker,; Sotirios A. Tsaftaris

arXiv:2307.05704·cs.LG·July 13, 2023

A Causal Ordering Prior for Unsupervised Representation Learning

Avinash Kori, Pedro Sanchez, Konstantinos Vilouras, Ben Glocker,, Sotirios A. Tsaftaris

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel unsupervised method for representation learning that incorporates causal relationships among latent variables, using a causal ordering prior inspired by functional causal models and Hessian-based loss.

Contribution

It proposes a fully unsupervised approach to causal representation learning that relaxes independence assumptions and enforces causal order in latent space without auxiliary data.

Findings

01

Demonstrates causal ordering in latent space using Hessian-based loss

02

Achieves identifiable causal representations without supervision

03

Extends variational inference to causal latent models

Abstract

Unsupervised representation learning with variational inference relies heavily on independence assumptions over latent variables. Causal representation learning (CRL), however, argues that factors of variation in a dataset are, in fact, causally related. Allowing latent variables to be correlated, as a consequence of causal relationships, is more realistic and generalisable. So far, provably identifiable methods rely on: auxiliary information, weak labels, and interventional or even counterfactual data. Inspired by causal discovery with functional causal models, we propose a fully unsupervised representation learning method that considers a data generation process with a latent additive noise model (ANM). We encourage the latent space to follow a causal ordering via loss function based on the Hessian of the latent distribution.

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

The paper attempts to provide a new method for causal representation learning, combining different results from the causal discovery and representation learning literature in a novel way.

Weaknesses

1. Theorem 2 is, on its own and in its current formulation, incomplete. Supposedly, it exploits the results in (Kivva et al., 2022). However, the statement says, _"invertible mixing functions"_. For any invertible mixing functions, without further assumptions (none are stated in Thm. 2), it is possible to build counterexamples to identifiability in the i.i.d. setting based on the Darmois construction [1]. It also appears non-rigorous to state that the mixing functions are not _"identically distr

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

- The paper studies an important problem, i.e., how to learn latent variables with causal relations. - It nicely combines ideas from different fields: Identifiability of latent variable model with piecewise linear mixing, Gaussian mixture models, score based causal discovery. - The main paper is generally easy to follow - Experimental results look generally promising.

Weaknesses

- There do not seem to be any completely novel ideas in the paper (score based causal discovery, gmm latent prior, identifiability without auxiliary information). (this is a minor point) - For the identifiability part, there are some questions regarding the combination of assumptions (see questions). - The proofs in Appendix A could be much clearer (see questions), and, more generally, all math parts should be checked carefully (being slightly imprecise can make it very difficult to follow the d

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

- The research direction is important because causal representation learning is a promising extension of neural methods that aims to leverage additional causal information for robustness and generalization. - The proposed method coVAE shows improved MCC and R^2 metrics over prior experimental baselines, i.e. improved recovery of meaningful latents.

Weaknesses

- The abstract claims that all provable identifiable methods rely on additonal information, however the main work (Kivva et al.) that the authors cite, show that we do not need additional information, therefore, this claim should be revised. - It's hard to understand how Assumptions 2, 3, 5 together work. They seem unrelated assumptions with different motivations -- Assumption 2, 3 on latent SCMs is more directly related to the paper since it's inherently causal; however, Assumption 5 on GMMs i

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare

MethodsVariational Inference