Adapting Critic Match Loss Landscape Visualization to Off-policy Reinforcement Learning

Jingyi Liu; Jian Guo; Eberhard Gill

arXiv:2603.14589·cs.LG·March 17, 2026

Adapting Critic Match Loss Landscape Visualization to Off-policy Reinforcement Learning

Jingyi Liu, Jian Guo, Eberhard Gill

PDF

Open Access

TL;DR

This paper adapts a critic match loss landscape visualization method from online to off-policy reinforcement learning, specifically for the SAC algorithm, to analyze critic optimization geometry in control tasks.

Contribution

It introduces an adaptation of the critic match loss landscape visualization for off-policy RL, enabling geometric analysis of critic learning in replay-based algorithms.

Findings

01

Distinct geometric patterns observed in different SAC variants.

02

Quantitative metrics reveal differences in optimization behavior.

03

Landscape analysis provides insights into critic convergence and divergence.

Abstract

This work extends an established critic match loss landscape visualization method from online to off-policy reinforcement learning (RL), aiming to reveal the optimization geometry behind critic learning. Off-policy RL differs from stepwise online actor-critic learning in its replay-based data flow and target computation. Based on these two structural differences, the critic match loss landscape visualization method is adapted to the Soft Actor-Critic (SAC) algorithm by aligning the loss evaluation with its batch-based data flow and target computation, using a fixed replay batch and precomputed critic targets from the selected policy. Critic parameters recorded during training are projected onto a principal component plane, where the critic match loss is evaluated to form a 3-D landscape with an overlaid 2-D optimization path. Applied to a spacecraft attitude control problem, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Spacecraft Dynamics and Control