Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions

Pedram Agand; Mo Chen

arXiv:2601.18107·cs.LG·January 27, 2026

Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions

Pedram Agand, Mo Chen

PDF

Open Access

TL;DR

This paper introduces MoReBRAC, a model-based offline RL framework that synthesizes high-confidence transitions using uncertainty estimation to improve policy learning in safety-critical domains.

Contribution

MoReBRAC employs a dual-recurrent world model with a hierarchical uncertainty pipeline to generate reliable synthetic data, enhancing offline RL performance.

Findings

01

Significant performance improvements on D4RL Gym-MuJoCo benchmarks.

02

Effective filtering of synthetic transitions using uncertainty measures.

03

Insights into the VAE's role as a geometric anchor in transition synthesis.

Abstract

Offline Reinforcement Learning (ORL) holds immense promise for safety-critical domains like industrial robotics, where real-time environmental interaction is often prohibitive. A primary obstacle in ORL remains the distributional shift between the static dataset and the learned policy, which typically mandates high degrees of conservatism that can restrain potential policy improvements. We present MoReBRAC, a model-based framework that addresses this limitation through Uncertainty-Aware latent synthesis. Instead of relying solely on the fixed data, MoReBRAC utilizes a dual-recurrent world model to synthesize high-fidelity transitions that augment the training manifold. To ensure the reliability of this synthetic data, we implement a hierarchical uncertainty pipeline integrating Variational Autoencoder (VAE) manifold detection, model sensitivity analysis, and Monte Carlo (MC) dropout.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis