Progressive extension of reinforcement learning action dimension for asymmetric assembly tasks
Yuhang Gai, Jiuming Guo, Dan Wu, Ken Chen

TL;DR
This paper introduces a progressive extension of action dimensions (PEAD) mechanism that accelerates reinforcement learning convergence in asymmetric assembly tasks by combining RL with compliance control, improving efficiency and stability.
Contribution
The paper proposes the novel PEAD mechanism to enhance RL convergence speed and efficiency specifically for complex asymmetric assembly tasks.
Findings
PEAD improves data-efficiency of RL algorithms.
PEAD accelerates convergence in RL.
PEAD increases stable rewards in RL applications.
Abstract
Reinforcement learning (RL) is always the preferred embodiment to construct the control strategy of complex tasks, like asymmetric assembly tasks. However, the convergence speed of reinforcement learning severely restricts its practical application. In this paper, the convergence is first accelerated by combining RL and compliance control. Then a completely innovative progressive extension of action dimension (PEAD) mechanism is proposed to optimize the convergence of RL algorithms. The PEAD method is verified in DDPG and PPO. The results demonstrate the PEAD method will enhance the data-efficiency and time-efficiency of RL algorithms as well as increase the stable reward, which provides more potential for the application of RL.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Manufacturing Process and Optimization
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Experience Replay · Dense Connections · Entropy Regularization · Weight Decay · Batch Normalization · Proximal Policy Optimization · Deep Deterministic Policy Gradient
