No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
Skander Moalla, Andrea Miele, Daniil Pyatko, Razvan Pascanu, Caglar, Gulcehre

TL;DR
This paper investigates how representation collapse affects PPO in reinforcement learning, revealing that regularizing representation dynamics with a new auxiliary loss can prevent performance collapse.
Contribution
It introduces Proximal Feature Optimization (PFO), a novel auxiliary loss that mitigates representation collapse and improves PPO stability in RL environments.
Findings
Representation rank deterioration correlates with performance collapse.
Stronger non-stationarity worsens feature collapse and agent performance.
PFO effectively mitigates representation collapse and enhances PPO stability.
Abstract
Reinforcement learning (RL) is inherently rife with non-stationarity since the states and rewards the agent observes during training depend on its changing policy. Therefore, networks in deep RL must be capable of adapting to new observations and fitting new targets. However, previous works have observed that networks trained under non-stationarity exhibit an inability to continue learning, termed loss of plasticity, and eventually a collapse in performance. For off-policy deep value-based RL methods, this phenomenon has been correlated with a decrease in representation rank and the ability to fit random targets, termed capacity loss. Although this correlation has generally been attributed to neural network learning under non-stationarity, the connection to representation dynamics has not been carefully studied in on-policy policy optimization methods. In this work, we empirically study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsOutsourcing and Supply Chain Management
MethodsEntropy Regularization · Proximal Policy Optimization
