Dyna-T: Dyna-Q and Upper Confidence Bounds Applied to Trees
Tarek Faycal, Claudio Zito

TL;DR
Dyna-T introduces a novel reinforcement learning algorithm that combines Dyna-Q with Upper Confidence Trees to improve decision-making efficiency and robustness in stochastic environments.
Contribution
The paper proposes Dyna-T, integrating UCT with Dyna-Q to enhance planning efficiency and robustness in RL, addressing computational challenges of traditional methods.
Findings
Dyna-T outperforms Dyna-Q in stochastic environments.
Dyna-T demonstrates improved action selection robustness.
Preliminary tests show promising results on OpenAI environments.
Abstract
In this work we present a preliminary investigation of a novel algorithm called Dyna-T. In reinforcement learning (RL) a planning agent has its own representation of the environment as a model. To discover an optimal policy to interact with the environment, the agent collects experience in a trial and error fashion. Experience can be used for learning a better model or improve directly the value function and policy. Typically separated, Dyna-Q is an hybrid approach which, at each iteration, exploits the real experience to update the model as well as the value function, while planning its action using simulated data from its model. However, the planning process is computationally expensive and strongly depends on the dimensionality of the state-action space. We propose to build a Upper Confidence Tree (UCT) on the simulated experience and search for the best action to be selected during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Auction Theory and Applications
