Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck Equation
Shuyu Yin, Fei Wen, Peilin Liu, Tao Luo

TL;DR
This paper introduces a novel method using the Fokker--Planck equation to visualize and analyze the implicit bias and loss landscape transformations in semi-gradient Q-learning, revealing the nature of saddle points and minima.
Contribution
It develops a new approach to probe implicit bias in semi-gradient Q-learning by visualizing effective loss landscapes via the Fokker--Planck equation.
Findings
Global minima can become saddle points in the effective loss landscape.
Saddle points originating from global minima persist in high-dimensional neural networks.
The method provides insights into the implicit bias of semi-gradient Q-learning.
Abstract
Semi-gradient Q-learning is applied in many fields, but due to the absence of an explicit loss function, studying its dynamics and implicit bias in the parameter space is challenging. This paper introduces the Fokker--Planck equation and employs partial data obtained through sampling to construct and visualize the effective loss landscape within a two-dimensional parameter space. This visualization reveals how the global minima in the loss landscape can transform into saddle points in the effective loss landscape, as well as the implicit bias of the semi-gradient method. Additionally, we demonstrate that saddle points, originating from the global minima in loss landscape, still exist in the effective loss landscape under high-dimensional parameter spaces and neural network settings. This paper develop a novel approach for probing implicit bias in semi-gradient Q-learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsQ-Learning
