Optimizing TD3 for 7-DOF Robotic Arm Grasping: Overcoming Suboptimality   with Exploration-Enhanced Contrastive Learning

Wen-Han Hsieh; Jen-Yuan Chang

arXiv:2408.14009·cs.RO·August 27, 2024

Optimizing TD3 for 7-DOF Robotic Arm Grasping: Overcoming Suboptimality with Exploration-Enhanced Contrastive Learning

Wen-Han Hsieh, Jen-Yuan Chang

PDF

Open Access

TL;DR

This paper introduces EECL, a novel exploration method integrated into TD3 that enhances exploration for 7-DOF robotic arm control, leading to more optimal policies and faster convergence.

Contribution

The paper proposes EECL, a new exploration-enhanced contrastive learning module that improves exploration in TD3 for robotic manipulation tasks.

Findings

01

EECL significantly improves exploration efficiency.

02

EECL accelerates convergence speed of TD3.

03

EECL outperforms baseline TD3 in robotic grasping tasks.

Abstract

In actor-critic-based reinforcement learning algorithms such as Twin Delayed Deep Deterministic policy gradient (TD3), insufficient exploration of the spatial space can result in suboptimal policies when controlling 7-DOF robotic arms. To address this issue, we propose a novel Exploration-Enhanced Contrastive Learning (EECL) module that improves exploration by providing additional rewards for encountering novel states. Our module stores previously explored states in a buffer and identifies new states by comparing them with historical data using Euclidean distance within a K-dimensional tree (KDTree) framework. When the agent explores new states, exploration rewards are assigned. These rewards are then integrated into the TD3 algorithm, ensuring that the Q-learning process incorporates these signals, promoting more effective strategy optimization. We evaluate our method on the robosuite…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Target Policy Smoothing · Clipped Double Q-learning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Q-Learning · Experience Replay · Contrastive Learning · Adam · Twin Delayed Deep Deterministic