# Towards Multi-Objective Object Push-Grasp Policy Based on Maximum Entropy Deep Reinforcement Learning under Sparse Rewards

**Authors:** Tengteng Zhang, Hongwei Mo

PMC · DOI: 10.3390/e26050416 · Entropy · 2024-05-12

## TL;DR

This paper introduces a new deep reinforcement learning method for robotic grasping that works well in unstructured environments with sparse rewards.

## Contribution

The novel ME-DQN framework combines attention mechanisms and maximum entropy reinforcement learning for improved generalization and performance.

## Key findings

- The proposed ME-DQN achieves a 91.6% grasping success rate in simulations.
- The method demonstrates strong generalization performance in real-world settings.
- The approach eliminates the need for hyper-parameter tuning in sparse reward tasks.

## Abstract

In unstructured environments, robots need to deal with a wide variety of objects with diverse shapes, and often, the instances of these objects are unknown. Traditional methods rely on training with large-scale labeled data, but in environments with continuous and high-dimensional state spaces, the data become sparse, leading to weak generalization ability of the trained models when transferred to real-world applications. To address this challenge, we present an innovative maximum entropy Deep Q-Network (ME-DQN), which leverages an attention mechanism. The framework solves complex and sparse reward tasks through probabilistic reasoning while eliminating the trouble of adjusting hyper-parameters. This approach aims to merge the robust feature extraction capabilities of Fully Convolutional Networks (FCNs) with the efficient feature selection of the attention mechanism across diverse task scenarios. By integrating an advantage function with the reasoning and decision-making of deep reinforcement learning, ME-DQN propels the frontier of robotic grasping and expands the boundaries of intelligent perception and grasping decision-making in unstructured environments. Our simulations demonstrate a remarkable grasping success rate of 91.6%, while maintaining excellent generalization performance in the real world.

## Full-text entities

- **Diseases:** injury to people or property (MESH:C000719191), TD (MESH:D004409)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11120306/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11120306/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC11120306/full.md

---
Source: https://tomesphere.com/paper/PMC11120306