The Divergence of Reinforcement Learning Algorithms with Value-Iteration   and Function Approximation

Michael Fairbank; Eduardo Alonso

arXiv:1107.4606·cs.LG·July 31, 2012·1 cites

The Divergence of Reinforcement Learning Algorithms with Value-Iteration and Function Approximation

Michael Fairbank, Eduardo Alonso

PDF

Open Access

TL;DR

This paper demonstrates that several major reinforcement learning algorithms can diverge when using function approximation in a value iteration setting, including some previously thought to be stable, highlighting potential pitfalls in their application.

Contribution

The paper provides new divergence examples for value-iteration-based RL algorithms with function approximation, including TD(1) and Sarsa(1), in a greedy policy scenario.

Findings

01

Divergence occurs for TD(1) and Sarsa(1) with greedy policies.

02

Major RL algorithms like HDP, DHP, and GDHP can diverge under value iteration.

03

Divergence examples differ from previous literature, applicable in greedy policy scenarios.

Abstract

This paper gives specific divergence examples of value-iteration for several major Reinforcement Learning and Adaptive Dynamic Programming algorithms, when using a function approximator for the value function. These divergence examples differ from previous divergence examples in the literature, in that they are applicable for a greedy policy, i.e. in a "value iteration" scenario. Perhaps surprisingly, with a greedy policy, it is also possible to get divergence for the algorithms TD(1) and Sarsa(1). In addition to these divergences, we also achieve divergence for the Adaptive Dynamic Programming algorithms HDP, DHP and GDHP.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Smart Grid Energy Management