Dyna-T: Dyna-Q and Upper Confidence Bounds Applied to Trees

Tarek Faycal; Claudio Zito

arXiv:2201.04502·cs.LG·January 20, 2022

Dyna-T: Dyna-Q and Upper Confidence Bounds Applied to Trees

Tarek Faycal, Claudio Zito

PDF

Open Access

TL;DR

Dyna-T introduces a novel reinforcement learning algorithm that combines Dyna-Q with Upper Confidence Trees to improve decision-making efficiency and robustness in stochastic environments.

Contribution

The paper proposes Dyna-T, integrating UCT with Dyna-Q to enhance planning efficiency and robustness in RL, addressing computational challenges of traditional methods.

Findings

01

Dyna-T outperforms Dyna-Q in stochastic environments.

02

Dyna-T demonstrates improved action selection robustness.

03

Preliminary tests show promising results on OpenAI environments.

Abstract

In this work we present a preliminary investigation of a novel algorithm called Dyna-T. In reinforcement learning (RL) a planning agent has its own representation of the environment as a model. To discover an optimal policy to interact with the environment, the agent collects experience in a trial and error fashion. Experience can be used for learning a better model or improve directly the value function and policy. Typically separated, Dyna-Q is an hybrid approach which, at each iteration, exploits the real experience to update the model as well as the value function, while planning its action using simulated data from its model. However, the planning process is computationally expensive and strongly depends on the dimensionality of the state-action space. We propose to build a Upper Confidence Tree (UCT) on the simulated experience and search for the best action to be selected during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Auction Theory and Applications