Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning
Muhammad Burhan Hafez, Cornelius Weber, Matthias Kerzel, Stefan, Wermter

TL;DR
This paper introduces a novel intrinsically motivated actor-critic algorithm that learns robotic visuomotor skills efficiently from raw visual data, combining deep autoencoders and ensemble world models for improved exploration and stability.
Contribution
The paper presents a new intrinsically motivated continuous actor-critic method that integrates deep autoencoders and ensemble predictive models for stable, data-efficient learning from pixel inputs in robotic control.
Findings
Outperforms state-of-the-art methods in robotic reaching and grasping tasks.
Achieves better performance in both dense and sparse reward scenarios.
Demonstrates improved data efficiency and stability over existing algorithms.
Abstract
In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data-efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
