Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning

Jikai Jin; Vasilis Syrgkanis; Sham Kakade; Hanlin Zhang

arXiv:2506.10378·cs.LG·June 13, 2025

Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning

Jikai Jin, Vasilis Syrgkanis, Sham Kakade, Hanlin Zhang

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a causal representation learning framework to uncover the latent hierarchical capabilities of language models, revealing causal relationships among abilities like problem-solving, instruction-following, and reasoning.

Contribution

The study proposes a novel causal modeling approach that identifies a concise hierarchical structure of latent capabilities across a large model dataset, improving interpretability.

Findings

01

Identified a three-node causal structure explaining performance variations

02

Revealed a hierarchy from problem-solving to reasoning abilities

03

Highlighted the importance of controlling for base model confounders

Abstract

Faithful evaluation of language model capabilities is crucial for deriving actionable insights that can inform model development. However, rigorous causal evaluations in this domain face significant methodological challenges, including complex confounding effects and prohibitive computational costs associated with extensive retraining. To tackle these challenges, we propose a causal representation learning framework wherein observed benchmark performance is modeled as a linear transformation of a few latent capability factors. Crucially, these latent factors are identified as causally interrelated after appropriately controlling for the base model as a common confounder. Applying this approach to a comprehensive dataset encompassing over 1500 models evaluated across six benchmarks from the Open LLM Leaderboard, we identify a concise three-node linear causal structure that reliably…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

1. This work provides a novel causal framing by introducing causal representation learning to interpret model capability hierarchies. 2. The method is based on well-motivated derivation of HCA with identifiable conditions and theoretical grounding. 3. Reliance on public leaderboard datasets ensures reproducibility and relevance for community benchmarking.

Weaknesses

1. The hierarchy is inferred from only six benchmarks, so the results may not generalize. 2. No robustness study on the hyperparameters. 3. Table 2 is not referenced in the main text.

Reviewer 02Rating 4Confidence 4

Strengths

1. Causal Discovery of Hierarchical Structure: The HCA method moves beyond correlation-based analysis (like PCA) to establish a causally-directed hierarchy among capabilities, providing a validated roadmap for improving LLMs. 2. Robustness by Controlling Confounding: The analysis rigorously accounts for performance heterogeneity across different base models, ensuring the discovered invariant causal structure is robust and not merely a result of pre-training confounding effects. 3. Actionable Ins

Weaknesses

1. The analysis is constrained by the six benchmarks available, resulting in only three coarse-grained capabilities. A richer set of benchmarks might reveal a more complex hierarchy . 2. While SFT experiments support the $z_2 \rightarrow z_3$ link, the broader causal claims rely partly on observational correlations and interpretation. The ATE analysis acknowledges unverifiable ignorability assumptions. There could be the potential circularity in naming factors based on correlated benchmarks and

Reviewer 03Rating 4Confidence 3

Strengths

1. The paper pioneers the application of causal representation learning using only observational data to understand LLM capabilities, moving beyond correlational scaling laws or factor analysis. 2. The discovered hierarchy (general problem-solving → instruction-following → math reasoning) provides a potentially practical "causal roadmap" for prioritizing post-training efforts, suggesting interventions on parent capabilities can benefit child capabilities.

Weaknesses

1. The method relies on strong assumptions like the linearity of capability-performance mapping, the DAG structure of capabilities, and non-Gaussian noise for ICA, which might not fully hold in reality . The impact of violations (e.g., non-linearity) is not deeply explored. 2. Concerns were raised about the discovered graph's robustness to the choice of base models and benchmarks. While Appendix H provides some sensitivity analysis, the filtering step and potential finite-sample errors impacti

Code & Models

Repositories

hlzhang109/causal-eval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)

MethodsBalanced Selection