Distilling Deep Reinforcement Learning into Interpretable Fuzzy Rules: An Explainable AI Framework
Sanup S. Araballi, Simon Khan, Chilukuri K. Mohan

TL;DR
This paper introduces a fuzzy rule-based framework to interpret deep reinforcement learning policies, improving transparency and fidelity over existing methods, and validated on a lunar lander control task.
Contribution
It presents a hierarchical fuzzy classifier system that distills neural policies into human-readable rules using clustering and regression, with novel metrics for interpretability and fidelity evaluation.
Findings
Achieved 81.48% fidelity on Lunar Lander (Continuous)
Outperformed decision trees by 21 percentage points
Demonstrated statistically significant interpretability improvements
Abstract
Deep Reinforcement Learning (DRL) agents achieve remarkable performance in continuous control but remain opaque, hindering deployment in safety-critical domains. Existing explainability methods either provide only local insights (SHAP, LIME) or employ over-simplified surrogates failing to capture continuous dynamics (decision trees). This work proposes a Hierarchical Takagi-Sugeno-Kang (TSK) Fuzzy Classifier System (FCS) distilling neural policies into human-readable IF-THEN rules through K-Means clustering for state partitioning and Ridge Regression for local action inference. Three quantifiable metrics are introduced: Fuzzy Rule Activation Density (FRAD) measuring explanation focus, Fuzzy Set Coverage (FSC) validating vocabulary completeness, and Action Space Granularity (ASG) assessing control mode diversity. Dynamic Time Warping (DTW) validates temporal behavioral fidelity.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications · Reinforcement Learning in Robotics
