Interpretable by Design: Query-Specific Neural Modules for Explainable Reinforcement Learning
Mehrdad Zakershahrak

TL;DR
This paper introduces Query Conditioned Deterministic Inference Networks (QDIN), a novel RL architecture that explicitly models diverse queries about the environment, improving interpretability and knowledge extraction without sacrificing control performance.
Contribution
The paper proposes a unified neural architecture for RL that treats various environment queries as first-class citizens, enabling better interpretability and knowledge extraction.
Findings
Inference accuracy can reach 99% for reachability despite low control performance.
Query-specific architectures outperform unified models and post-hoc methods.
Representations for world knowledge differ from those for control.
Abstract
Reinforcement learning has traditionally focused on a singular objective: learning policies that select actions to maximize reward. We challenge this paradigm by asking: what if we explicitly architected RL systems as inference engines that can answer diverse queries about their environment? In deterministic settings, trained agents implicitly encode rich knowledge about reachability, distances, values, and dynamics - yet current architectures are not designed to expose this information efficiently. We introduce Query Conditioned Deterministic Inference Networks (QDIN), a unified architecture that treats different types of queries (policy, reachability, paths, comparisons) as first-class citizens, with specialized neural modules optimized for each inference pattern. Our key empirical finding reveals a fundamental decoupling: inference accuracy can reach near-perfect levels (99%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
