Off-Policy Deep Reinforcement Learning Algorithms for Handling Various Robotic Manipulator Tasks
Altun Rzayev, Vahid Tavakol Aghaei

TL;DR
This paper compares the efficiency and speed of three off-policy deep reinforcement learning algorithms—DDPG, TD3, and SAC—in training a robotic manipulator across four tasks in a simulated environment, highlighting their advantages over traditional control methods.
Contribution
It provides a comparative analysis of DDPG, TD3, and SAC algorithms for robotic manipulation tasks, demonstrating their effectiveness and efficiency in simulation.
Findings
All three algorithms successfully trained the manipulator.
SAC showed the fastest convergence among the three.
Off-policy algorithms outperform traditional control methods in speed and data efficiency.
Abstract
In order to avoid conventional controlling methods which created obstacles due to the complexity of systems and intense demand on data density, developing modern and more efficient control methods are required. In this way, reinforcement learning off-policy and model-free algorithms help to avoid working with complex models. In terms of speed and accuracy, they become prominent methods because the algorithms use their past experience to learn the optimal policies. In this study, three reinforcement learning algorithms; DDPG, TD3 and SAC have been used to train Fetch robotic manipulator for four different tasks in MuJoCo simulation environment. All of these algorithms are off-policy and able to achieve their desired target by optimizing both policy and value functions. In the current study, the efficiency and the speed of these three algorithms are analyzed in a controlled environment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Software Engineering Methodologies
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Dilated Convolution · 1x1 Convolution · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Average Pooling · Dense Connections · Weight Decay · Convolution · Deep Deterministic Policy Gradient
