# O$^2$TD: (Near)-Optimal Off-Policy TD Learning

**Authors:** Bo Liu, Daoming Lyu, Wen Dong, Saad Biaz

arXiv: 1704.05147 · 2017-04-21

## TL;DR

This paper introduces two novel algorithms for off-policy temporal difference learning that aim to more accurately approximate the true value function, addressing limitations of existing methods and providing near-optimal solutions with linear computational cost.

## Contribution

It proposes a batch algorithm for optimal off-policy prediction and a linear-cost near-optimal online algorithm, along with a new perspective on emphatic TD learning.

## Key findings

- The batch algorithm effectively approximates the true value function.
- The online algorithm achieves near-optimal performance with linear computational cost.
- A new perspective connects off-policy optimality with stability in TD learning.

## Abstract

Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions is optimal w.r.t approximating the true value function $V$. Two novel algorithms are proposed to approximate the true value function $V$. This paper makes the following contributions: (1) A batch algorithm that can help find the approximate optimal off-policy prediction of the true value function $V$. (2) A linear computational cost (per step) near-optimal algorithm that can learn from a collection of off-policy samples. (3) A new perspective of the emphatic temporal difference learning which bridges the gap between off-policy optimality and off-policy stability.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.05147/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1704.05147/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1704.05147/full.md

---
Source: https://tomesphere.com/paper/1704.05147