Progressive extension of reinforcement learning action dimension for   asymmetric assembly tasks

Yuhang Gai; Jiuming Guo; Dan Wu; Ken Chen

arXiv:2104.04078·cs.LG·April 12, 2021

Progressive extension of reinforcement learning action dimension for asymmetric assembly tasks

Yuhang Gai, Jiuming Guo, Dan Wu, Ken Chen

PDF

Open Access

TL;DR

This paper introduces a progressive extension of action dimensions (PEAD) mechanism that accelerates reinforcement learning convergence in asymmetric assembly tasks by combining RL with compliance control, improving efficiency and stability.

Contribution

The paper proposes the novel PEAD mechanism to enhance RL convergence speed and efficiency specifically for complex asymmetric assembly tasks.

Findings

01

PEAD improves data-efficiency of RL algorithms.

02

PEAD accelerates convergence in RL.

03

PEAD increases stable rewards in RL applications.

Abstract

Reinforcement learning (RL) is always the preferred embodiment to construct the control strategy of complex tasks, like asymmetric assembly tasks. However, the convergence speed of reinforcement learning severely restricts its practical application. In this paper, the convergence is first accelerated by combining RL and compliance control. Then a completely innovative progressive extension of action dimension (PEAD) mechanism is proposed to optimize the convergence of RL algorithms. The PEAD method is verified in DDPG and PPO. The results demonstrate the PEAD method will enhance the data-efficiency and time-efficiency of RL algorithms as well as increase the stable reward, which provides more potential for the application of RL.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Manufacturing Process and Optimization

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Experience Replay · Dense Connections · Entropy Regularization · Weight Decay · Batch Normalization · Proximal Policy Optimization · Deep Deterministic Policy Gradient