From Black-box to Causal-box: Towards Building More Interpretable Models

Inwoo Hwang; Yushu Pan; Elias Bareinboim

arXiv:2510.21998·cs.LG·October 28, 2025

From Black-box to Causal-box: Towards Building More Interpretable Models

Inwoo Hwang, Yushu Pan, Elias Bareinboim

PDF

TL;DR

This paper introduces the concept of causal interpretability for models, analyzing existing models' limitations, and proposing a framework to design models that can answer counterfactual questions, balancing interpretability and accuracy.

Contribution

It formalizes causal interpretability, provides a graphical criterion for model design supporting counterfactual queries, and characterizes the tradeoff between interpretability and predictive power.

Findings

01

Blackbox and concept-based models are not causally interpretable in general.

02

A framework for designing causally interpretable models is developed.

03

Experiments validate the theoretical tradeoff between interpretability and accuracy.

Abstract

Understanding the predictions made by deep learning models remains a central challenge, especially in high-stakes applications. A promising approach is to equip models with the ability to answer counterfactual questions -- hypothetical ``what if?'' scenarios that go beyond the observed data and provide insight into a model reasoning. In this work, we introduce the notion of causal interpretability, which formalizes when counterfactual queries can be evaluated from a specific class of models and observational data. We analyze two common model classes -- blackbox and concept-based predictors -- and show that neither is causally interpretable in general. To address this gap, we develop a framework for building models that are causally interpretable by design. Specifically, we derive a complete graphical criterion that determines whether a given model architecture supports a given…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.