A Multi-Level Causal Intervention Framework for Mechanistic Interpretability in Variational Autoencoders

Dip Roy; Rajiv Misra; Sanjay Kumar Singh; Anisha Roy

arXiv:2505.03530·cs.LG·April 7, 2026

A Multi-Level Causal Intervention Framework for Mechanistic Interpretability in Variational Autoencoders

Dip Roy, Rajiv Misra, Sanjay Kumar Singh, Anisha Roy

PDF

TL;DR

This paper introduces a multilevel causal intervention framework for understanding VAEs, proposes new metrics, and conducts extensive empirical analysis across multiple architectures and datasets.

Contribution

It presents the first general-purpose causal intervention framework for VAEs, along with new metrics and a large empirical study revealing key insights.

Findings

01

CES negatively correlates with DCI disentanglement within datasets.

02

KL reweighting in beta-VAE causes capacity bottlenecks on complex datasets.

03

No single VAE architecture outperforms others across all datasets.

Abstract

Understanding how generative models represent and transform data is a foundational problem in deep learning interpretability. While mechanistic interpretability of discriminative architectures has yielded substantial insights, relatively little work has addressed variational autoencoders (VAEs). This paper presents the first general-purpose multilevel causal intervention framework for mechanistic interpretability of VAEs. The framework comprises four manipulation types: input manipulation, latent-space perturbation, activation patching, and causal mediation analysis. We also define three new quantitative metrics capturing properties not measured by existing disentanglement metrics alone: Causal Effect Strength (CES), intervention specificity, and circuit modularity. We conduct the largest empirical study to date of VAE causal mechanisms across six architectures (standard VAE, beta-VAE,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.