Improving width-based planning with compact policies
Miquel Junyent, Anders Jonsson, Vicen\c{c} G\'omez

TL;DR
This paper introduces a novel approach combining planning with learning, using the Iterated-Width planner to efficiently explore and learn compact policies, outperforming traditional RL methods in sparse reward environments.
Contribution
The work integrates IW planning with policy learning to improve exploration and efficiency in sparse reward problems, demonstrating superior performance over existing RL algorithms.
Findings
Outperforms A2C and Alpha Zero in simple test problems.
Shows promising preliminary results on Atari games.
Uses structured exploration instead of random exploration.
Abstract
Optimal action selection in decision problems characterized by sparse, delayed rewards is still an open challenge. For these problems, current deep reinforcement learning methods require enormous amounts of data to learn controllers that reach human-level performance. In this work, we propose a method that interleaves planning and learning to address this issue. The planning step hinges on the Iterated-Width (IW) planner, a state of the art planner that makes explicit use of the state representation to perform structured exploration. IW is able to scale up to problems independently of the size of the state space. From the state-actions visited by IW, the learning step estimates a compact policy, which in turn is used to guide the planning step. The type of exploration used by our method is radically different than the standard random exploration used in RL. We evaluate our method in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Robot Manipulation and Learning
MethodsA2C
