Causal Concept Graphs in LLM Latent Space for Stepwise Reasoning

Md Muntaqim Meherab; Noor Islam S. Mohammad; and Faiza Feroz

arXiv:2603.10377·cs.LG·April 27, 2026

Causal Concept Graphs in LLM Latent Space for Stepwise Reasoning

Md Muntaqim Meherab, Noor Islam S. Mohammad, and Faiza Feroz

PDF

TL;DR

This paper introduces Causal Concept Graphs (CCG), a method to model and analyze causal relationships between concepts in language models during multi-step reasoning, improving interpretability and intervention effectiveness.

Contribution

It combines task-conditioned autoencoders with differentiable structure learning to create sparse, domain-specific causal graphs that outperform baselines in reasoning tasks.

Findings

01

CCG achieves higher Causal Fidelity Scores than baselines.

02

Learned graphs are sparse, domain-specific, and stable across seeds.

03

Interventions based on CCG induce larger downstream effects.

Abstract

Sparse autoencoders can localize where concepts live in language models, but not how they interact during multi-step reasoning. We propose Causal Concept Graphs (CCG): a directed acyclic graph over sparse, interpretable latent features, where edges capture learned causal dependencies between concepts. We combine task-conditioned sparse autoencoders for concept discovery with DAGMA-style differentiable structure learning for graph recovery and introduce the Causal Fidelity Score (CFS) to evaluate whether graph-guided interventions induce larger downstream effects than random ones. On ARC-Challenge, StrategyQA, and LogiQA with GPT-2 Medium, across five seeds ( $n = 15$ paired runs), CCG achieves $\CFS = 5.654 \pm 0.625$ , outperforming ROME-style tracing ( $3.382 \pm 0.233$ ), SAE-only ranking ( $2.479 \pm 0.196$ ), and a random baseline ( $1.032 \pm 0.034$ ), with $p < 0.0001$ after Bonferroni correction.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.