Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs
Hector Kohler, Riad Akrour, Philippe Preux

TL;DR
This paper investigates the limitations of actor-critic algorithms in learning decision tree policies within IBMDPs, revealing failures in deep RL approaches and proposing efficient solutions for supervised classification tasks.
Contribution
It demonstrates the failure modes of deep RL in partially observable settings and introduces a new approach for learning optimal decision trees as fully observable MDPs.
Findings
Deep RL can fail on simple toy tasks for DT learning.
Optimal decision trees can be efficiently learned as fully observable MDPs.
New algorithms outperform classical greedy methods in DT learning.
Abstract
Interpretability of AI models allows for user safety checks to build trust in such AIs. In particular, Decision Trees (DTs) provide a global look at the learned model and transparently reveal which features of the input are critical for making a decision. However, interpretability is hindered if the DT is too large. To learn compact trees, a recent Reinforcement Learning (RL) framework has been proposed to explore the space of DTs using deep RL. This framework augments a decision problem (e.g. a supervised classification task) with additional actions that gather information about the features of an otherwise hidden input. By appropriately penalizing these actions, the agent learns to optimally trade-off size and performance of DTs. In practice, a reactive policy for a partially observable Markov decision process (MDP) needs to be learned, which is still an open problem. We show in this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics
Methodsfail
