When Policies Cannot Be Retrained: A Unified Closed-Form View of Post-Training Steering in Offline Reinforcement Learning
Elias Hossain, Mohammad Jahid Ibna Basher, Ivan Garibay, Ozlem Garibay, Niloofar Yousefi

TL;DR
This paper investigates how to adapt offline RL policies at deployment without retraining, using a unified closed-form approach based on Product-of-Experts composition and goal-conditioned priors.
Contribution
It provides a theoretical link between PoE composition and KL-regularized adaptation, and empirically evaluates their effectiveness and limitations in various environments.
Findings
PoE composition remains anchored to the frozen actor under degraded priors.
KL-regularized adaptation can be equivalent to PoE with specific coefficients.
In some environments, medium-expert policies cannot improve beyond a certain performance level.
Abstract
Offline reinforcement learning (RL) can learn effective policies from fixed datasets, but deployment objectives may change after training, and in many applications the trained actor cannot be retrained because of data, cost, or governance constraints. We study deployment-time adaptation for frozen offline actors using Product-of-Experts (PoE) composition with a goal-conditioned prior. Our main practical finding is graceful degradation rather than universal performance gain: under degraded or random priors, precision-weighted composition remains anchored to the frozen actor, while additive and prior-only adaptation collapse, and a KL-budget selector often recovers a near-oracle operating point. We also make explicit a closed-form identity in the frozen-actor setting: for diagonal-Gaussian actors and priors, PoE with coefficient alpha yields the same deterministic policy as KL-regularized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
