Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning
Elias Malomgr\'e, Pieter Simoens

TL;DR
This paper introduces a novel reinforcement learning framework that leverages a mixture of autoencoder experts to utilize incomplete and unlabeled demonstrations, guiding exploration effectively without relying solely on explicit rewards.
Contribution
It presents a new method combining autoencoder experts and intrinsic reward shaping to improve exploration using imperfect demonstrations in RL.
Findings
Enables robust exploration in sparse and dense reward environments.
Performs well with incomplete and sparse demonstration data.
Outperforms baseline methods in experimental evaluations.
Abstract
Recent trends in Reinforcement Learning (RL) highlight the need for agents to learn from reward-free interactions and alternative supervision signals, such as unlabeled or incomplete demonstrations, rather than relying solely on explicit reward maximization. Additionally, developing generalist agents that can adapt efficiently in real-world environments often requires leveraging these reward-free signals to guide learning and behavior. However, while intrinsic motivation techniques provide a means for agents to seek out novel or uncertain states in the absence of explicit rewards, they are often challenged by dense reward environments or the complexity of high-dimensional state and action spaces. Furthermore, most existing approaches rely directly on the unprocessed intrinsic reward signals, which can make it difficult to shape or control the agent's exploration effectively. We propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
