Optimizing TD3 for 7-DOF Robotic Arm Grasping: Overcoming Suboptimality with Exploration-Enhanced Contrastive Learning
Wen-Han Hsieh, Jen-Yuan Chang

TL;DR
This paper introduces EECL, a novel exploration method integrated into TD3 that enhances exploration for 7-DOF robotic arm control, leading to more optimal policies and faster convergence.
Contribution
The paper proposes EECL, a new exploration-enhanced contrastive learning module that improves exploration in TD3 for robotic manipulation tasks.
Findings
EECL significantly improves exploration efficiency.
EECL accelerates convergence speed of TD3.
EECL outperforms baseline TD3 in robotic grasping tasks.
Abstract
In actor-critic-based reinforcement learning algorithms such as Twin Delayed Deep Deterministic policy gradient (TD3), insufficient exploration of the spatial space can result in suboptimal policies when controlling 7-DOF robotic arms. To address this issue, we propose a novel Exploration-Enhanced Contrastive Learning (EECL) module that improves exploration by providing additional rewards for encountering novel states. Our module stores previously explored states in a buffer and identifies new states by comparing them with historical data using Euclidean distance within a K-dimensional tree (KDTree) framework. When the agent explores new states, exploration rewards are assigned. These rewards are then integrated into the TD3 algorithm, ensuring that the Q-learning process incorporates these signals, promoting more effective strategy optimization. We evaluate our method on the robosuite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Target Policy Smoothing · Clipped Double Q-learning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Q-Learning · Experience Replay · Contrastive Learning · Adam · Twin Delayed Deep Deterministic
