Learning Policy Representations for Steerable Behavior Synthesis

Beiming Li; Sergio Rozada; Alejandro Ribeiro

arXiv:2601.22350·cs.LG·February 2, 2026

Learning Policy Representations for Steerable Behavior Synthesis

Beiming Li, Sergio Rozada, Alejandro Ribeiro

PDF

Open Access

TL;DR

This paper introduces a method to learn policy representations as expectations over state-action features, enabling behavior steering and policy synthesis without retraining, by encoding policies into a smooth latent space.

Contribution

The authors propose a novel set-based architecture for policy representation that allows for direct gradient-based behavior steering in a learned latent space.

Findings

01

Effective policy encoding as set-based latent embeddings.

02

Smooth latent space enables gradient-based policy optimization.

03

Successful behavior synthesis under unseen value constraints.

Abstract

Given a Markov decision process (MDP), we seek to learn representations for a range of policies to facilitate behavior steering at test time. As policies of an MDP are uniquely determined by their occupancy measures, we propose modeling policy representations as expectations of state-action feature maps with respect to occupancy measures. We show that these representations can be approximated uniformly for a range of policies using a set-based architecture. Our model encodes a set of state-action samples into a latent embedding, from which we decode both the policy and its value functions corresponding to multiple rewards. We use variational generative approach to induce a smooth latent space, and further shape it with contrastive learning so that latent distances align with differences in value functions. This geometry permits gradient-based optimization directly in the latent space.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning