Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions
Pedram Agand, Mo Chen

TL;DR
This paper introduces MoReBRAC, a model-based offline RL framework that synthesizes high-confidence transitions using uncertainty estimation to improve policy learning in safety-critical domains.
Contribution
MoReBRAC employs a dual-recurrent world model with a hierarchical uncertainty pipeline to generate reliable synthetic data, enhancing offline RL performance.
Findings
Significant performance improvements on D4RL Gym-MuJoCo benchmarks.
Effective filtering of synthetic transitions using uncertainty measures.
Insights into the VAE's role as a geometric anchor in transition synthesis.
Abstract
Offline Reinforcement Learning (ORL) holds immense promise for safety-critical domains like industrial robotics, where real-time environmental interaction is often prohibitive. A primary obstacle in ORL remains the distributional shift between the static dataset and the learned policy, which typically mandates high degrees of conservatism that can restrain potential policy improvements. We present MoReBRAC, a model-based framework that addresses this limitation through Uncertainty-Aware latent synthesis. Instead of relying solely on the fixed data, MoReBRAC utilizes a dual-recurrent world model to synthesize high-fidelity transitions that augment the training manifold. To ensure the reliability of this synthetic data, we implement a hierarchical uncertainty pipeline integrating Variational Autoencoder (VAE) manifold detection, model sensitivity analysis, and Monte Carlo (MC) dropout.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
