Off-policy Reinforcement Learning with Model-based Exploration Augmentation

Likun Wang; Xiangteng Zhang; Yinuo Wang; Guojian Zhan; Wenxuan Wang; Haoyu Gao; Jingliang Duan; Shengbo Eben Li

arXiv:2510.25529·cs.AI·October 30, 2025

Off-policy Reinforcement Learning with Model-based Exploration Augmentation

Likun Wang, Xiangteng Zhang, Yinuo Wang, Guojian Zhan, Wenxuan Wang, Haoyu Gao, Jingliang Duan, Shengbo Eben Li

PDF

TL;DR

This paper introduces MoGE, a model-based exploration augmentation method that synthesizes critical states and transitions to improve exploration efficiency in reinforcement learning, demonstrating significant performance gains.

Contribution

MoGE is a novel, modular approach combining diffusion-based state generation and transition modeling to enhance exploration in off-policy RL algorithms.

Findings

01

Improves sample efficiency in complex control tasks.

02

Achieves higher performance compared to baseline methods.

03

Effectively generates under-explored states and transitions.

Abstract

Exploration is fundamental to reinforcement learning (RL), as it determines how effectively an agent discovers and exploits the underlying structure of its environment to achieve optimal performance. Existing exploration methods generally fall into two categories: active exploration and passive exploration. The former introduces stochasticity into the policy but struggles in high-dimensional environments, while the latter adaptively prioritizes transitions in the replay buffer to enhance exploration, yet remains constrained by limited sample diversity. To address the limitation in passive exploration, we propose Modelic Generative Exploration (MoGE), which augments exploration through the generation of under-explored critical states and synthesis of dynamics-consistent experiences through transition models. MoGE is composed of two components: (1) a diffusion-based generator that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.