Reward-Free Policy Space Compression for Reinforcement Learning
Mirco Mutti, Stefano Del Col, Marcello Restelli

TL;DR
This paper introduces a method to compress the vast policy space in reinforcement learning into a finite set of representative policies, reducing complexity while maintaining performance, through a game-theoretic approach.
Contribution
It formulates policy space compression as a set cover problem and proposes an efficient game-theoret solution for reward-free policy compression.
Findings
Effective policy space compression demonstrated in simple domains
Reduces sample and computation inefficiencies in reinforcement learning
Provides a theoretical foundation for policy set approximation
Abstract
In reinforcement learning, we encode the potential behaviors of an agent interacting with an environment into an infinite set of policies, the policy space, typically represented by a family of parametric functions. Dealing with such a policy space is a hefty challenge, which often causes sample and computation inefficiencies. However, we argue that a limited number of policies are actually relevant when we also account for the structure of the environment and of the policy parameterization, as many of them would induce very similar interactions, i.e., state-action distributions. In this paper, we seek for a reward-free compression of the policy space into a finite set of representative policies, such that, given any policy , the minimum R\'enyi divergence between the state-action distributions of the representative policies and the state-action distribution of is bounded. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Formal Methods in Verification
