Image-Based Deep Reinforcement Learning with Intrinsically Motivated   Stimuli: On the Execution of Complex Robotic Tasks

David Valencia; Henry Williams; Yuning Xing; Trevor Gee; Minas; Liarokapis; Bruce A. MacDonald

arXiv:2407.21338·cs.AI·August 1, 2024

Image-Based Deep Reinforcement Learning with Intrinsically Motivated Stimuli: On the Execution of Complex Robotic Tasks

David Valencia, Henry Williams, Yuning Xing, Trevor Gee, Minas, Liarokapis, Bruce A. MacDonald

PDF

Open Access

TL;DR

This paper introduces NaSA-TD3, an image-based reinforcement learning method that leverages intrinsic motivation signals like novelty and surprise to improve exploration and learning efficiency in complex robotic tasks with sparse rewards.

Contribution

It presents a novel, sample-efficient RL algorithm that learns directly from pixel data, incorporating intrinsic motivation to enhance exploration in complex environments.

Findings

01

NaSA-TD3 outperforms state-of-the-art RL methods in robotic tasks.

02

The method is effective in both simulated and real-world environments.

03

NaSA-TD3 does not require pre-training or human demonstrations.

Abstract

Reinforcement Learning (RL) has been widely used to solve tasks where the environment consistently provides a dense reward value. However, in real-world scenarios, rewards can often be poorly defined or sparse. Auxiliary signals are indispensable for discovering efficient exploration strategies and aiding the learning process. In this work, inspired by intrinsic motivation theory, we postulate that the intrinsic stimuli of novelty and surprise can assist in improving exploration in complex, sparsely rewarded environments. We introduce a novel sample-efficient method able to learn directly from pixels, an image-based extension of TD3 with an autoencoder called \textit{NaSA-TD3}. The experiments demonstrate that NaSA-TD3 is easy to train and an efficient method for tackling complex continuous-control robotic tasks, both in simulated environments and real-world settings. NaSA-TD3…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInnovation Diffusion and Forecasting

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Adam · Dense Connections · Target Policy Smoothing · Experience Replay · Clipped Double Q-learning · Twin Delayed Deep Deterministic