Value-Based Reinforcement Learning for Continuous Control Robotic Manipulation in Multi-Task Sparse Reward Settings
Sreehari Rammohan, Shangqun Yu, Bowen He, Eric Hsiung, Eric Rosen,, Stefanie Tellex, George Konidaris

TL;DR
This paper demonstrates that value-based reinforcement learning, specifically RBF-DQN, can effectively and efficiently learn continuous robotic manipulation tasks in multi-task sparse reward environments, outperforming some policy-gradient methods.
Contribution
The paper introduces RBF-DQN for continuous control in sparse reward settings and shows its superior convergence speed over existing algorithms like TD3, SAC, and PPO.
Findings
RBF-DQN converges faster than TD3, SAC, and PPO on robotic tasks.
Hindsight Experience Replay and Prioritized Experience Replay enhance RBF-DQN performance.
Value-based methods are more sensitive to data augmentation and replay techniques than policy-gradient methods.
Abstract
Learning continuous control in high-dimensional sparse reward settings, such as robotic manipulation, is a challenging problem due to the number of samples often required to obtain accurate optimal value and policy estimates. While many deep reinforcement learning methods have aimed at improving sample efficiency through replay or improved exploration techniques, state of the art actor-critic and policy gradient methods still suffer from the hard exploration problem in sparse reward settings. Motivated by recent successes of value-based methods for approximating state-action values, like RBF-DQN, we explore the potential of value-based reinforcement learning for learning continuous robotic manipulation tasks in multi-task sparse reward settings. On robotic manipulation tasks, we empirically show RBF-DQN converges faster than current state of the art algorithms such as TD3, SAC, and PPO.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural and Behavioral Psychology Studies · Neural dynamics and brain function
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Clipped Double Q-learning · Target Policy Smoothing · Adam · Dense Connections · Entropy Regularization · 1x1 Convolution · Twin Delayed Deep Deterministic · Convolution · Average Pooling
