Unsupervised Behavioral Compression: Learning Low-Dimensional Policy Manifolds through State-Occupancy Matching
Andrea Fraschini, Davide Tenedini, Riccardo Zamboni, Mirco Mutti, Marcello Restelli

TL;DR
This paper introduces Occupancy-based Policy Compression (OPC), a method that learns low-dimensional policy manifolds by matching state occupancy distributions, improving over action-based approaches in reinforcement learning.
Contribution
It proposes a novel occupancy-based compression framework with a differentiable objective and diverse policy dataset generation, enhancing behavioral generalization and policy representation.
Findings
OPC outperforms action-matching methods on continuous control benchmarks.
The approach achieves better behavioral diversity and generalization.
Occupancy matching reduces sequential decision errors.
Abstract
Deep Reinforcement Learning (DRL) is widely recognized as sample-inefficient, a limitation attributable in part to the high dimensionality and substantial functional redundancy inherent to the policy parameter space. A recent framework, which we refer to as Action-based Policy Compression (APC), mitigates this issue by compressing the parameter space into a low-dimensional latent manifold using a learned generative mapping . However, its performance is severely constrained by relying on immediate action-matching as a reconstruction loss, a myopic proxy for behavioral similarity that suffers from compounding errors across sequential decisions. To overcome this bottleneck, we introduce Occupancy-based Policy Compression (OPC), which enhances APC by shifting behavior representation from immediate action-matching to long-horizon state-space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
