Steady State Analysis of Episodic Reinforcement Learning
Huang Bojun

TL;DR
This paper demonstrates that all finite-horizon episodic reinforcement learning environments have a unique steady state, which can be leveraged for improved data collection and policy optimization, unifying episodic and continual RL frameworks.
Contribution
It proves the existence of unique steady states in episodic RL, unifies episodic and continual RL concepts, and introduces methods for rapid steady-state convergence.
Findings
Unique steady states exist in all finite-horizon episodic RL environments.
Steady-state distribution convergence is demonstrated in episodic learning processes.
A perturbation method accelerates steady-state convergence in real-world RL tasks.
Abstract
This paper proves that the episodic learning environment of every finite-horizon decision task has a unique steady state under any behavior policy, and that the marginal distribution of the agent's input indeed converges to the steady-state distribution in essentially all episodic learning processes. This observation supports an interestingly reversed mindset against conventional wisdom: While the existence of unique steady states was often presumed in continual learning but considered less relevant in episodic learning, it turns out their existence is guaranteed for the latter. Based on this insight, the paper unifies episodic and continual RL around several important concepts that have been separately treated in these two RL formalisms. Practically, the existence of unique and approachable steady state enables a general way to collect data in episodic RL tasks, which the paper applies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management
