Policy Gradient Methods in the Presence of Symmetries and State   Abstractions

Prakash Panangaden; Sahand Rezaei-Shoshtari; Rosie Zhao; David Meger,; Doina Precup

arXiv:2305.05666·cs.LG·March 8, 2024·1 cites

Policy Gradient Methods in the Presence of Symmetries and State Abstractions

Prakash Panangaden, Sahand Rezaei-Shoshtari, Rosie Zhao, David Meger,, Doina Precup

PDF

Open Access 2 Repos

TL;DR

This paper extends the concept of MDP homomorphisms to continuous spaces, enabling reinforcement learning algorithms to leverage environment symmetries for improved policy optimization and abstraction in complex control tasks.

Contribution

It introduces a new theoretical framework and algorithms for policy gradient methods that incorporate continuous MDP homomorphisms and symmetries, enhancing learning efficiency.

Findings

01

Effective policy learning in environments with continuous symmetries.

02

Improved performance on visual control tasks from the DeepMind Control Suite.

03

Visualization of learned abstractions showing structured latent spaces.

Abstract

Reinforcement learning (RL) on high-dimensional and complex problems relies on abstraction for improved efficiency and generalization. In this paper, we study abstraction in the continuous-control setting, and extend the definition of Markov decision process (MDP) homomorphisms to the setting of continuous state and action spaces. We derive a policy gradient theorem on the abstract MDP for both stochastic and deterministic policies. Our policy gradient results allow for leveraging approximate symmetries of the environment for policy optimization. Based on these theorems, we propose a family of actor-critic algorithms that are able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. Finally, we introduce a series of environments with continuous symmetries to further demonstrate the ability of our algorithm for action abstraction in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning