A Loss Landscape Visualization Framework for Interpreting Reinforcement Learning: An ADHDP Case Study

Jingyi Liu; Jian Guo; Eberhard Gill

arXiv:2603.14600·cs.LG·March 17, 2026

A Loss Landscape Visualization Framework for Interpreting Reinforcement Learning: An ADHDP Case Study

Jingyi Liu, Jian Guo, Eberhard Gill

PDF

Open Access

TL;DR

This paper introduces a comprehensive visualization framework for understanding reinforcement learning dynamics, demonstrated through an ADHDP case study in spacecraft attitude control, revealing how different components influence learning stability.

Contribution

The authors extend a critic match loss landscape visualization into a multi-perspective framework that clarifies the interactions of value estimation, policy optimization, and TD signals during training.

Findings

01

Visualization reveals how training stabilizers affect the loss landscape.

02

Comparison of ADHDP variants shows landscape changes impact learning stability.

03

Framework provides systematic insights into RL behavior across different algorithms.

Abstract

Reinforcement learning algorithms have been widely used in dynamic and control systems. However, interpreting their internal learning behavior remains a challenge. In the authors' previous work, a critic match loss landscape visualization method was proposed to study critic training. This study extends that method into a framework which provides a multi-perspective view of the learning dynamics, clarifying how value estimation, policy optimization, and temporal-difference (TD) signals interact during training. The proposed framework includes four complementary components; a three-dimensional reconstruction of the critic match loss surface that shows how TD targets shape the optimization geometry; an actor loss landscape under a frozen critic that reveals how the policy exploits that geometry; a trajectory combining time, Bellman error, and policy weights that indicates how updates move…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research