Recursive Backwards Q-Learning in Deterministic Environments
Jan Diekhoff, J\"orn Fischer

TL;DR
This paper introduces Recursive Backwards Q-Learning (RBQL), a model-based method that efficiently solves deterministic problems by propagating values backwards from terminal states, outperforming standard Q-learning in maze navigation.
Contribution
The paper presents RBQL, a novel recursive, model-based Q-learning algorithm that improves learning speed and accuracy in deterministic environments.
Findings
RBQL outperforms standard Q-learning in maze shortest path tasks.
RBQL efficiently propagates value information backwards from terminal states.
The method significantly reduces learning time in deterministic problems.
Abstract
Reinforcement learning is a popular method of finding optimal solutions to complex problems. Algorithms like Q-learning excel at learning to solve stochastic problems without a model of their environment. However, they take longer to solve deterministic problems than is necessary. Q-learning can be improved to better solve deterministic problems by introducing such a model-based approach. This paper introduces the recursive backwards Q-learning (RBQL) agent, which explores and builds a model of the environment. After reaching a terminal state, it recursively propagates its value backwards through this model. This lets each state be evaluated to its optimal value without a lengthy learning process. In the example of finding the shortest path through a maze, this agent greatly outperforms a regular Q-learning agent.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Science and Education Research · Experimental Learning in Engineering · Fault Detection and Control Systems
MethodsQ-Learning
