Adapting Critic Match Loss Landscape Visualization to Off-policy Reinforcement Learning
Jingyi Liu, Jian Guo, Eberhard Gill

TL;DR
This paper adapts a critic match loss landscape visualization method from online to off-policy reinforcement learning, specifically for the SAC algorithm, to analyze critic optimization geometry in control tasks.
Contribution
It introduces an adaptation of the critic match loss landscape visualization for off-policy RL, enabling geometric analysis of critic learning in replay-based algorithms.
Findings
Distinct geometric patterns observed in different SAC variants.
Quantitative metrics reveal differences in optimization behavior.
Landscape analysis provides insights into critic convergence and divergence.
Abstract
This work extends an established critic match loss landscape visualization method from online to off-policy reinforcement learning (RL), aiming to reveal the optimization geometry behind critic learning. Off-policy RL differs from stepwise online actor-critic learning in its replay-based data flow and target computation. Based on these two structural differences, the critic match loss landscape visualization method is adapted to the Soft Actor-Critic (SAC) algorithm by aligning the loss evaluation with its batch-based data flow and target computation, using a fixed replay batch and precomputed critic targets from the selected policy. Critic parameters recorded during training are projected onto a principal component plane, where the critic match loss is evaluated to form a 3-D landscape with an overlaid 2-D optimization path. Applied to a spacecraft attitude control problem, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Spacecraft Dynamics and Control
