Hierarchical Support Vector State Partitioning for Distilling Black Box Reinforcement Learning Policies

Senne Deproost; Mehrdad Asadi; Ann Now\'e

arXiv:2605.04254·cs.LG·May 18, 2026

Hierarchical Support Vector State Partitioning for Distilling Black Box Reinforcement Learning Policies

Senne Deproost, Mehrdad Asadi, Ann Now\'e

PDF

TL;DR

This paper presents SVSP, a new method for distilling black box reinforcement learning policies into interpretable subpolicies using state space partitioning, improving performance and reducing complexity.

Contribution

SVSP introduces a structured, linear SVM-based partitioning approach for policy distillation, enhancing interpretability and efficiency over previous methods.

Findings

01

Improves mean return by +7.4% over Voronoi State Partitioning.

02

Achieves +2.8% mean return improvement over original TD3 policy.

03

Reduces number of subpolicies by 82.1% compared to VSP.

Abstract

We introduce State Vector Space Partitioning (SVSP), a novel method to mimic a black box reinforcement learning policy using a set of human-interpretable subpolicies. By partitioning a distillation dataset of state action pairs with linear support vector machine splits, SVSP constructs a compact and structured representation of the original policy. Our method improves mean return by +7.4% over previous critic driven state partitioning attempts such as Voronoi State Partitioning (VSP) and +2.8% over the original TD3 policy, while reducing the number of required subpolicies against VSP by 82.1%. Our results pave the path towards a more flexible form of distillation where both the decision boundary and surrogate models can be chosen within a margin of the original black box behavior.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.