XQCfD: Accelerating Fast Actor-Critic Algorithms with Prior Data and Prior Policies
Daniel Palenicek, Florian Vogt, Joe Watson, Ingmar Posner, Danica Kragic, Jan Peters

TL;DR
XQCfD enhances sample efficiency in reinforcement learning by effectively utilizing prior data and policies, achieving state-of-the-art results in complex manipulation tasks with sparse rewards.
Contribution
The paper introduces XQCfD, a novel method that extends actor-critic algorithms to better leverage demonstrations and pretrained policies with stationary architectures.
Findings
XQCfD outperforms previous methods on manipulation benchmarks.
It achieves high performance with low update-to-data ratio.
Stationary network architecture improves out-of-distribution policy improvement.
Abstract
For reinforcement learning in the real world online exploration is expensive A common practice in robotic reinforcement learning is to incorporate additional data to improve sample efficiency Expert demonstration data is often crucial for solving hard exploration tasks with sparse rewards While prior data is used to augment experience and pretrain models we show that the design of existing algorithms fails to achieve the sample efficiency that is possible in this setting due to a failure to use pretrained policies effectively We propose XQCfD which extends the sample-efficient XQC actor-critic to learn from demonstrations using augmented replay buffers pretrained policies and stationary policy architectures designed to avoid rapidly unlearning the strong initial policy like prior works We show our stationary network architecture enables policy improvement out-of-distribution better than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
