Optimal Decision Tree Policies for Markov Decision Processes
Dani\"el Vos, Sicco Verwer

TL;DR
This paper introduces OMDTs, a method for directly optimizing size-limited decision trees for Markov Decision Processes using Mixed-Integer Linear Programming, achieving near-optimal policies with interpretability.
Contribution
It proposes OMDTs, the first approach to directly maximize expected return of decision trees in MDPs under size constraints, addressing limitations of imitation learning.
Findings
OMDTs often outperform imitation learning in policy optimality.
Limited-depth OMDTs (depth 3) perform close to the optimal.
Imitation learning struggles with complex policies in size-limited trees.
Abstract
Interpretability of reinforcement learning policies is essential for many real-world tasks but learning such interpretable policies is a hard problem. Particularly rule-based policies such as decision trees and rules lists are difficult to optimize due to their non-differentiability. While existing techniques can learn verifiable decision tree policies there is no guarantee that the learners generate a decision that performs optimally. In this work, we study the optimization of size-limited decision trees for Markov Decision Processes (MPDs) and propose OMDTs: Optimal MDP Decision Trees. Given a user-defined size limit and MDP formulation OMDT directly maximizes the expected discounted return for the decision tree using Mixed-Integer Linear Programming. By training optimal decision tree policies for different MDPs we empirically study the optimality gap for existing imitation learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
