Value-of-Information based Arbitration between Model-based and Model-free Control
Krishn Bera, Yash Mandilwar, Bapi Raju

TL;DR
This paper introduces a novel arbitration framework based on the value of information to effectively combine model-based and model-free control in reinforcement learning, improving learning efficiency and performance.
Contribution
It proposes a quantitative, uncertainty-based arbitration method that integrates model-based and model-free RL, advancing the understanding of skill learning mechanisms.
Findings
Outperforms standard Q-learning in experiments
Better data and computational efficiency than existing methods
Effective integration of model-based and model-free control
Abstract
There have been numerous attempts in explaining the general learning behaviours using model-based and model-free methods. While the model-based control is flexible yet computationally expensive in planning, the model-free control is quick but inflexible. The model-based control is therefore immune from reward devaluation and contingency degradation. Multiple arbitration schemes have been suggested to achieve the data efficiency and computational efficiency of model-based and model-free control respectively. In this context, we propose a quantitative 'value of information' based arbitration between both the controllers in order to establish a general computational framework for skill learning. The interacting model-based and model-free reinforcement learning processes are arbitrated using an uncertainty-based value of information. We further show that our algorithm performs better than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Control Systems Optimization · Receptor Mechanisms and Signaling
MethodsQ-Learning
