Residual Policy Learning
Tom Silver, Kelsey Allen, Josh Tenenbaum, Leslie Kaelbling

TL;DR
Residual Policy Learning (RPL) enhances existing controllers in robotic tasks by learning residuals with deep reinforcement learning, significantly improving performance in complex, real-world scenarios where traditional RL struggles.
Contribution
The paper introduces RPL, a simple method to improve nondifferentiable policies using model-free deep RL, effective in complex robotic manipulation tasks with imperfect initial controllers.
Findings
RPL improves performance across six challenging MuJoCo tasks.
RPL enables long-horizon, sparse-reward tasks that standard RL cannot handle.
RPL consistently enhances initial controllers, combining RL and control strengths.
Abstract
We present Residual Policy Learning (RPL): a simple method for improving nondifferentiable policies using model-free deep reinforcement learning. RPL thrives in complex robotic manipulation tasks where good but imperfect controllers are available. In these tasks, reinforcement learning from scratch remains data-inefficient or intractable, but learning a residual on top of the initial controller can yield substantial improvements. We study RPL in six challenging MuJoCo tasks involving partial observability, sensor noise, model misspecification, and controller miscalibration. For initial controllers, we consider both hand-designed policies and model-predictive controllers with known or learned transition models. By combining learning with control algorithms, RPL can perform long-horizon, sparse-reward tasks for which reinforcement learning alone fails. Moreover, we find that RPL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning
