Fill in the Blanks: Accelerating Q-Learning with a Handful of Demonstrations in Sparse Reward Settings
Seyed Mahdi Basiri Azad, Joschka Boedecker

TL;DR
This paper introduces a method that leverages a few demonstrations to initialize value functions in sparse-reward RL, significantly improving sample efficiency and convergence speed by combining offline pretraining with online refinement.
Contribution
The paper presents a hybrid offline-online approach that uses demonstrations to bootstrap RL in sparse environments, reducing exploration and enhancing learning efficiency.
Findings
Accelerates convergence in sparse-reward tasks.
Outperforms standard RL baselines with minimal demonstrations.
Reduces exploration burden in reinforcement learning.
Abstract
Reinforcement learning (RL) in sparse-reward environments remains a significant challenge due to the lack of informative feedback. We propose a simple yet effective method that uses a small number of successful demonstrations to initialize the value function of an RL agent. By precomputing value estimates from offline demonstrations and using them as targets for early learning, our approach provides the agent with a useful prior over promising actions. The agent then refines these estimates through standard online interaction. This hybrid offline-to-online paradigm significantly reduces the exploration burden and improves sample efficiency in sparse-reward settings. Experiments on benchmark tasks demonstrate that our method accelerates convergence and outperforms standard baselines, even with minimal or suboptimal demonstration data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
