Policy Gradient Methods in the Presence of Symmetries and State Abstractions
Prakash Panangaden, Sahand Rezaei-Shoshtari, Rosie Zhao, David Meger,, Doina Precup

TL;DR
This paper extends the concept of MDP homomorphisms to continuous spaces, enabling reinforcement learning algorithms to leverage environment symmetries for improved policy optimization and abstraction in complex control tasks.
Contribution
It introduces a new theoretical framework and algorithms for policy gradient methods that incorporate continuous MDP homomorphisms and symmetries, enhancing learning efficiency.
Findings
Effective policy learning in environments with continuous symmetries.
Improved performance on visual control tasks from the DeepMind Control Suite.
Visualization of learned abstractions showing structured latent spaces.
Abstract
Reinforcement learning (RL) on high-dimensional and complex problems relies on abstraction for improved efficiency and generalization. In this paper, we study abstraction in the continuous-control setting, and extend the definition of Markov decision process (MDP) homomorphisms to the setting of continuous state and action spaces. We derive a policy gradient theorem on the abstract MDP for both stochastic and deterministic policies. Our policy gradient results allow for leveraging approximate symmetries of the environment for policy optimization. Based on these theorems, we propose a family of actor-critic algorithms that are able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. Finally, we introduce a series of environments with continuous symmetries to further demonstrate the ability of our algorithm for action abstraction in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
