Plan-Space State Embeddings for Improved Reinforcement Learning
Max Pflueger, Gaurav S. Sukhatme

TL;DR
This paper introduces a novel variational method for learning plan-space state embeddings that enhance reinforcement learning by capturing geometric relationships from demonstrations, leading to improved policy performance.
Contribution
It proposes a new variational framework for embedding states based on plans, optimizing trajectory linearity, and demonstrates improved RL performance using these embeddings.
Findings
Embedding spaces improve policy gradient RL performance.
Using learned embeddings reduces training variance.
The method leverages demonstration data without restrictions on data collection.
Abstract
Robot control problems are often structured with a policy function that maps state values into control values, but in many dynamic problems the observed state can have a difficult to characterize relationship with useful policy actions. In this paper we present a new method for learning state embeddings from plans or other forms of demonstrations such that the embedding space has a specified geometric relationship with the demonstrations. We present a novel variational framework for learning these embeddings that attempts to optimize trajectory linearity in the learned embedding space. We show how these embedding spaces can then be used as an augmentation to the robot state in reinforcement learning problems. We use kinodynamic planning to generate training trajectories for some example environments, and then train embedding spaces for these environments. We show empirically that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
