Prism: Policy Reuse via Interpretable Strategy Mapping in Reinforcement Learning
Thomas Pravetz

TL;DR
PRISM introduces a method for reinforcement learning agents to interpret and transfer strategies using causally validated concepts, enabling zero-shot policy reuse across different algorithms.
Contribution
It develops a framework that clusters agent features into concepts, validates their causal role, and aligns them for effective zero-shot transfer of strategic knowledge.
Findings
Causal intervention confirms concepts directly influence agent actions.
Concept alignment enables successful zero-shot transfer in Go.
The approach is effective in domains with naturally discrete strategic states.
Abstract
We present PRISM (Policy Reuse via Interpretable Strategy Mapping), a framework that grounds reinforcement learning agents' decisions in discrete, causally validated concepts and uses those concepts as a zero-shot transfer interface between agents trained with different algorithms. PRISM clusters each agent's encoder features into concepts via K-means. Causal intervention establishes that these concepts directly drive - not merely correlate with - agent behavior: overriding concept assignments changes the selected action in 69.4% of interventions (, 2500 interventions). Concept importance and usage frequency are dissociated: the most-used concept (C47, 33.0% frequency) causes only a 9.4% win-rate drop when ablated, while ablating C16 (15.4% frequency) collapses win rate from 100% to 51.8%. Because concepts causally encode strategy, aligning them via optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
