Learning Policy Representations for Steerable Behavior Synthesis
Beiming Li, Sergio Rozada, Alejandro Ribeiro

TL;DR
This paper introduces a method to learn policy representations as expectations over state-action features, enabling behavior steering and policy synthesis without retraining, by encoding policies into a smooth latent space.
Contribution
The authors propose a novel set-based architecture for policy representation that allows for direct gradient-based behavior steering in a learned latent space.
Findings
Effective policy encoding as set-based latent embeddings.
Smooth latent space enables gradient-based policy optimization.
Successful behavior synthesis under unseen value constraints.
Abstract
Given a Markov decision process (MDP), we seek to learn representations for a range of policies to facilitate behavior steering at test time. As policies of an MDP are uniquely determined by their occupancy measures, we propose modeling policy representations as expectations of state-action feature maps with respect to occupancy measures. We show that these representations can be approximated uniformly for a range of policies using a set-based architecture. Our model encodes a set of state-action samples into a latent embedding, from which we decode both the policy and its value functions corresponding to multiple rewards. We use variational generative approach to induce a smooth latent space, and further shape it with contrastive learning so that latent distances align with differences in value functions. This geometry permits gradient-based optimization directly in the latent space.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning
