SPOT: Scalable Policy Optimization with Trees for Markov Decision Processes
Xuyuan Xiong, Pedro Chumpitaz-Flores, Kaixun Hua, Cheng Hua

TL;DR
SPOT introduces a scalable, efficient method for computing interpretable decision tree policies in Markov Decision Processes by formulating the problem as a MILP and employing a reduced-space branch-and-bound approach, significantly improving speed and scalability.
Contribution
The paper presents SPOT, a novel MILP-based method with a reduced-space branch-and-bound algorithm for scalable, interpretable decision tree policies in MDPs, outperforming previous approaches.
Findings
Achieves substantial speedup over existing methods.
Scales to larger MDPs with more states.
Produces interpretable, compact decision tree policies.
Abstract
Interpretable reinforcement learning policies are essential for high-stakes decision-making, yet optimizing decision tree policies in Markov Decision Processes (MDPs) remains challenging. We propose SPOT, a novel method for computing decision tree policies, which formulates the optimization problem as a mixed-integer linear program (MILP). To enhance efficiency, we employ a reduced-space branch-and-bound approach that decouples the MDP dynamics from tree-structure constraints, enabling efficient parallel search. This significantly improves runtime and scalability compared to previous methods. Our approach ensures that each iteration yields the optimal decision tree. Experimental results on standard benchmarks demonstrate that SPOT achieves substantial speedup and scales to larger MDPs with a significantly higher number of states. The resulting decision tree policies are interpretable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
