On the convergence of cycle detection for navigational reinforcement learning
Tom J. Ameloot, Jan Van den Bussche

TL;DR
This paper proves that a simple cycle-detection learning algorithm reliably converges in a class of navigation tasks called reducible, where solutions are acyclic, demonstrating effectiveness in nontrivial reinforcement learning scenarios.
Contribution
It provides a formal proof of convergence for a cycle-detection algorithm in reducible tasks and characterizes the final policy structure.
Findings
Convergence is guaranteed for reducible tasks with acyclic solutions.
The final policy can be precisely characterized syntactically.
Simple algorithms can successfully learn complex navigation tasks.
Abstract
We consider a reinforcement learning framework where agents have to navigate from start states to goal states. We prove convergence of a cycle-detection learning algorithm on a class of tasks that we call reducible. Reducible tasks have an acyclic solution. We also syntactically characterize the form of the final policy. This characterization can be used to precisely detect the convergence point in a simulation. Our result demonstrates that even simple algorithms can be successful in learning a large class of nontrivial tasks. In addition, our framework is elementary in the sense that we only use basic concepts to formally prove convergence.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Game Theory and Applications
