Dyna-Style Planning with Linear Function Approximation and Prioritized   Sweeping

Richard S. Sutton; Csaba Szepesvari; Alborz Geramifard; Michael P.; Bowling

arXiv:1206.3285·cs.AI·June 18, 2012·107 cites

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

Richard S. Sutton, Csaba Szepesvari, Alborz Geramifard, Michael P., Bowling

PDF

Open Access

TL;DR

This paper extends the Dyna architecture to linear function approximation, providing convergence guarantees and demonstrating improved planning efficiency through prioritized sweeping in large state spaces.

Contribution

It introduces a linear Dyna-style planning method with proven convergence and extends prioritized sweeping to linear function approximation, supported by empirical results.

Findings

01

Linear Dyna converges to a unique solution regardless of the distribution.

02

In the policy evaluation setting, the solution aligns with the LSTD method.

03

Empirical tests show improved performance on Mountain Car and Boyan Chain problems.

Abstract

We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation. Dynastyle planning proceeds by generating imaginary experience from the world model and then applying model-free reinforcement learning algorithms to the imagined state transitions. Our main results are to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions. In the policy evaluation setting, we prove that the limit point is the least-squares (LSTD) solution. An implication of our results is that prioritized-sweeping can be soundly extended to the linear approximation case,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Formal Methods in Verification