Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning

Elias Malomgr\'e; Pieter Simoens

arXiv:2507.15287·cs.LG·July 22, 2025

Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning

Elias Malomgr\'e, Pieter Simoens

PDF

TL;DR

This paper introduces a novel reinforcement learning framework that leverages a mixture of autoencoder experts to utilize incomplete and unlabeled demonstrations, guiding exploration effectively without relying solely on explicit rewards.

Contribution

It presents a new method combining autoencoder experts and intrinsic reward shaping to improve exploration using imperfect demonstrations in RL.

Findings

01

Enables robust exploration in sparse and dense reward environments.

02

Performs well with incomplete and sparse demonstration data.

03

Outperforms baseline methods in experimental evaluations.

Abstract

Recent trends in Reinforcement Learning (RL) highlight the need for agents to learn from reward-free interactions and alternative supervision signals, such as unlabeled or incomplete demonstrations, rather than relying solely on explicit reward maximization. Additionally, developing generalist agents that can adapt efficiently in real-world environments often requires leveraging these reward-free signals to guide learning and behavior. However, while intrinsic motivation techniques provide a means for agents to seek out novel or uncertain states in the absence of explicit rewards, they are often challenged by dense reward environments or the complexity of high-dimensional state and action spaces. Furthermore, most existing approaches rely directly on the unprocessed intrinsic reward signals, which can make it difficult to shape or control the agent's exploration effectively. We propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.