TL;DR
TriEx introduces a multi-view, evidence-based framework for explaining and analyzing the internal reasoning and belief dynamics of multi-agent LLMs in strategic games.
Contribution
The paper presents TriEx, a novel explainability framework that aligns self-reasoning, belief states, and environment audits to improve understanding of multi-agent LLM behavior.
Findings
TriEx enables scalable analysis of explanation faithfulness.
It reveals mismatches between agents' beliefs, statements, and actions.
The framework highlights the importance of interaction-dependent explainability.
Abstract
Explainability for Large Language Model (LLM) agents is especially challenging in interactive, partially observable settings, where decisions depend on evolving beliefs and other agents. We present \textbf{TriEx}, a tri-view explainability framework that instruments sequential decision making with aligned artifacts: (i) structured first-person self-reasoning bound to an action, (ii) explicit second-person belief states about opponents updated over time, and (iii) third-person oracle audits grounded in environment-derived reference signals. This design turns explanations from free-form narratives into evidence-anchored objects that can be compared and checked across time and perspectives. Using imperfect-information strategic games as a controlled testbed, we show that TriEx enables scalable analysis of explanation faithfulness, belief dynamics, and evaluator reliability, revealing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
