ReBeCA: Unveiling Interpretable Behavior Hierarchy behind the Iterative Self-Reflection of Language Models with Causal Analysis

Tianqiang Yan; Sihan Shang; Yuheng Li; Song Qiu; Hao Peng; Wenjian Luo; Jue Xie; Lizhen Qu; Yuan Gao

arXiv:2602.06373·cs.CL·February 9, 2026

ReBeCA: Unveiling Interpretable Behavior Hierarchy behind the Iterative Self-Reflection of Language Models with Causal Analysis

Tianqiang Yan, Sihan Shang, Yuheng Li, Song Qiu, Hao Peng, Wenjian Luo, Jue Xie, Lizhen Qu, Yuan Gao

PDF

Open Access

TL;DR

ReBeCA introduces a causal analysis framework to uncover the hierarchical and causal behavioral mechanisms behind language model self-reflection, improving interpretability and generalizability of the process.

Contribution

It presents ReBeCA, a novel causal analysis framework that models self-reflection behaviors as causal graphs, revealing genuine determinants and hierarchical influences.

Findings

01

Semantic behaviors influence self-reflection hierarchically

02

Limited causal behaviors affect generalizability

03

Positive behaviors can impair self-reflection efficacy

Abstract

While self-reflection can enhance language model reliability, its underlying mechanisms remain opaque, with existing analyses often yielding correlation-based insights that fail to generalize. To address this, we introduce \textbf{\texttt{ReBeCA}} (self-\textbf{\texttt{Re}}flection \textbf{\texttt{Be}}havior explained through \textbf{\texttt{C}}ausal \textbf{\texttt{A}}nalysis), a framework that unveils the interpretable behavioral hierarchy governing the self-reflection outcome. By modeling self-reflection trajectories as causal graphs, ReBeCA isolates genuine determinants of performance through a three-stage Invariant Causal Prediction (ICP) pipeline. We establish three critical findings: (1) \textbf{Behavioral hierarchy:} Semantic behaviors of the model influence final self-reflection results hierarchically: directly or indirectly; (2) \textbf{Causation matters:} Generalizability in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks