When Policies Cannot Be Retrained: A Unified Closed-Form View of Post-Training Steering in Offline Reinforcement Learning

Elias Hossain; Mohammad Jahid Ibna Basher; Ivan Garibay; Ozlem Garibay; Niloofar Yousefi

arXiv:2604.22873·cs.LG·April 28, 2026

When Policies Cannot Be Retrained: A Unified Closed-Form View of Post-Training Steering in Offline Reinforcement Learning

Elias Hossain, Mohammad Jahid Ibna Basher, Ivan Garibay, Ozlem Garibay, Niloofar Yousefi

PDF

TL;DR

This paper investigates how to adapt offline RL policies at deployment without retraining, using a unified closed-form approach based on Product-of-Experts composition and goal-conditioned priors.

Contribution

It provides a theoretical link between PoE composition and KL-regularized adaptation, and empirically evaluates their effectiveness and limitations in various environments.

Findings

01

PoE composition remains anchored to the frozen actor under degraded priors.

02

KL-regularized adaptation can be equivalent to PoE with specific coefficients.

03

In some environments, medium-expert policies cannot improve beyond a certain performance level.

Abstract

Offline reinforcement learning (RL) can learn effective policies from fixed datasets, but deployment objectives may change after training, and in many applications the trained actor cannot be retrained because of data, cost, or governance constraints. We study deployment-time adaptation for frozen offline actors using Product-of-Experts (PoE) composition with a goal-conditioned prior. Our main practical finding is graceful degradation rather than universal performance gain: under degraded or random priors, precision-weighted composition remains anchored to the frozen actor, while additive and prior-only adaptation collapse, and a KL-budget selector often recovers a near-oracle operating point. We also make explicit a closed-form identity in the frozen-actor setting: for diagonal-Gaussian actors and priors, PoE with coefficient alpha yields the same deterministic policy as KL-regularized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.