Optimizing Deep Reinforcement Learning for Adaptive Robotic Arm Control
Jonaid Shianifar, Michael Schukat, Karl Mason

TL;DR
This paper demonstrates that using Tree-structured Parzen Estimator (TPE) hyperparameter optimization significantly improves the performance and training efficiency of SAC and PPO algorithms in controlling a complex robotic arm with seven degrees of freedom.
Contribution
It introduces the application of TPE for hyperparameter tuning in deep reinforcement learning for robotic arm control, achieving substantial performance gains and faster convergence.
Findings
TPE improves SAC success rate by 10.48 percentage points.
TPE accelerates PPO convergence to near-optimal reward by 76%.
Training episodes reduced by approximately 40K for PPO and 80% faster for SAC.
Abstract
In this paper, we explore the optimization of hyperparameters for the Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) algorithms using the Tree-structured Parzen Estimator (TPE) in the context of robotic arm control with seven Degrees of Freedom (DOF). Our results demonstrate a significant enhancement in algorithm performance, TPE improves the success rate of SAC by 10.48 percentage points and PPO by 34.28 percentage points, where models trained for 50K episodes. Furthermore, TPE enables PPO to converge to a reward within 95% of the maximum reward 76% faster than without TPE, which translates to about 40K fewer episodes of training required for optimal performance. Also, this improvement for SAC is 80% faster than without TPE. This study underscores the impact of advanced hyperparameter optimization on the efficiency and success of deep reinforcement learning algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Transformation in Industry · Iterative Learning Control Systems · Elevator Systems and Control
MethodsConvolution · 1x1 Convolution · Dilated Convolution · Global Average Pooling · Average Pooling · Switchable Atrous Convolution · Entropy Regularization · Proximal Policy Optimization
