Interpreting Agentic Systems: Beyond Model Explanations to System-Level Accountability
Judy Zhu, Dhari Gandhi, Himanshu Joshi, Ahmad Rezaie Mianroodi, Sedef Akinli Kocak, Dhanesh Ramachandran

TL;DR
This paper examines the limitations of existing interpretability methods for agentic AI systems, emphasizing the need for new techniques to ensure safety, accountability, and oversight across their complex, goal-directed behaviors.
Contribution
It identifies gaps in current interpretability approaches for agentic systems and proposes future research directions for developing system-level oversight tools.
Findings
Current interpretability methods are insufficient for agentic systems.
Agentic systems require new, tailored interpretability techniques.
Embedding oversight mechanisms is crucial for safe deployment.
Abstract
Agentic systems have transformed how Large Language Models (LLMs) can be leveraged to create autonomous systems with goal-directed behaviors, consisting of multi-step planning and the ability to interact with different environments. These systems differ fundamentally from traditional machine learning models, both in architecture and deployment, introducing unique AI safety challenges, including goal misalignment, compounding decision errors, and coordination risks among interacting agents, that necessitate embedding interpretability and explainability by design to ensure traceability and accountability across their autonomous behaviors. Current interpretability techniques, developed primarily for static models, show limitations when applied to agentic systems. The temporal dynamics, compounding decisions, and context-dependent behaviors of agentic systems demand new analytical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multi-Agent Systems and Negotiation · AI-based Problem Solving and Planning
