Value-Based Reinforcement Learning for Continuous Control Robotic   Manipulation in Multi-Task Sparse Reward Settings

Sreehari Rammohan; Shangqun Yu; Bowen He; Eric Hsiung; Eric Rosen,; Stefanie Tellex; George Konidaris

arXiv:2107.13356·cs.RO·July 29, 2021·1 cites

Value-Based Reinforcement Learning for Continuous Control Robotic Manipulation in Multi-Task Sparse Reward Settings

Sreehari Rammohan, Shangqun Yu, Bowen He, Eric Hsiung, Eric Rosen,, Stefanie Tellex, George Konidaris

PDF

Open Access

TL;DR

This paper demonstrates that value-based reinforcement learning, specifically RBF-DQN, can effectively and efficiently learn continuous robotic manipulation tasks in multi-task sparse reward environments, outperforming some policy-gradient methods.

Contribution

The paper introduces RBF-DQN for continuous control in sparse reward settings and shows its superior convergence speed over existing algorithms like TD3, SAC, and PPO.

Findings

01

RBF-DQN converges faster than TD3, SAC, and PPO on robotic tasks.

02

Hindsight Experience Replay and Prioritized Experience Replay enhance RBF-DQN performance.

03

Value-based methods are more sensitive to data augmentation and replay techniques than policy-gradient methods.

Abstract

Learning continuous control in high-dimensional sparse reward settings, such as robotic manipulation, is a challenging problem due to the number of samples often required to obtain accurate optimal value and policy estimates. While many deep reinforcement learning methods have aimed at improving sample efficiency through replay or improved exploration techniques, state of the art actor-critic and policy gradient methods still suffer from the hard exploration problem in sparse reward settings. Motivated by recent successes of value-based methods for approximating state-action values, like RBF-DQN, we explore the potential of value-based reinforcement learning for learning continuous robotic manipulation tasks in multi-task sparse reward settings. On robotic manipulation tasks, we empirically show RBF-DQN converges faster than current state of the art algorithms such as TD3, SAC, and PPO.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural and Behavioral Psychology Studies · Neural dynamics and brain function

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Clipped Double Q-learning · Target Policy Smoothing · Adam · Dense Connections · Entropy Regularization · 1x1 Convolution · Twin Delayed Deep Deterministic · Convolution · Average Pooling