# Conditions on Features for Temporal Difference-Like Methods to Converge

**Authors:** Marcus Hutter, Samuel Yang-Zhao, Sultan J. Majeed

arXiv: 1905.11702 · 2019-05-29

## TL;DR

This paper characterizes when reinforcement learning algorithms with linear function approximation converge or not, revealing that feature choices critically influence convergence and proposing conditions for safe feature selection.

## Contribution

It introduces a new feature condition determining convergence of natural RL algorithms, unifies counter-examples, and guides feature design for reliable convergence.

## Key findings

- Natural algorithms converge if all value functions satisfy a specific shape.
- Most feature choices can lead to convergence to incorrect solutions.
- State aggregation features are proven to be a safe choice for convergence.

## Abstract

The convergence of many reinforcement learning (RL) algorithms with linear function approximation has been investigated extensively but most proofs assume that these methods converge to a unique solution. In this paper, we provide a complete characterization of non-uniqueness issues for a large class of reinforcement learning algorithms, simultaneously unifying many counter-examples to convergence in a theoretical framework. We achieve this by proving a new condition on features that can determine whether the convergence assumptions are valid or non-uniqueness holds. We consider a general class of RL methods, which we call natural algorithms, whose solutions are characterized as the fixed point of a projected Bellman equation (when it exists); notably, bootstrapped temporal difference-based methods such as $TD(\lambda)$ and $GTD(\lambda)$ are natural algorithms. Our main result proves that natural algorithms converge to the correct solution if and only if all the value functions in the approximation space satisfy a certain shape. This implies that natural algorithms are, in general, inherently prone to converge to the wrong solution for most feature choices even if the value function can be represented exactly. Given our results, we show that state aggregation based features are a safe choice for natural algorithms and we also provide a condition for finding convergent algorithms under other feature constructions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.11702/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1905.11702/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/1905.11702/full.md

---
Source: https://tomesphere.com/paper/1905.11702