Inference Time Policy Optimization for Offline RL with Differentiable World Models
Rohan Deb, Stephen J. Wright, Arindam Banerjee

TL;DR
This paper introduces an inference-time policy optimization method for offline reinforcement learning using a differentiable world model, leading to improved performance on continuous control benchmarks.
Contribution
It proposes a novel Differentiable World Model pipeline enabling end-to-end gradient-based policy adaptation during inference in offline RL.
Findings
Inference-time optimization improves policy performance on MuJoCo and AntMaze tasks.
Exploiting inference-time information yields consistent gains over strong offline RL baselines.
A cost-effective variant recovers much of the performance gains with reduced computational expense.
Abstract
Offline Reinforcement Learning (RL) learns optimal policies from fixed datasets, training a policy once and deploying it at inference time without further refinement. Inspired by model predictive control (MPC), we introduce an inference time adaptation framework that utilizes a pretrained policy along with a learned world model. While existing world model and diffusion-planning methods use learned dynamics to generate imagined trajectories during training, or to sample candidate plans at inference time, they do not use inference-time information to *optimize* the policy parameters on the fly. In contrast, our design is a Differentiable World Model (DWM) pipeline that enables end-to-end gradient computation through imagined rollouts for inference time policy optimization (ITPO). We evaluate our algorithm on D4RL continuous-control benchmarks (MuJoCo locomotion tasks and AntMaze), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Zebrafish Biomedical Research Applications
