Implicit Constraint-Aware Off-Policy Correction for Offline Reinforcement Learning

Ali Baheri

arXiv:2506.14058·eess.SY·June 18, 2025

Implicit Constraint-Aware Off-Policy Correction for Offline Reinforcement Learning

Ali Baheri

PDF

Open Access

TL;DR

This paper presents a novel off-policy correction method for offline reinforcement learning that incorporates structural priors directly into Bellman updates, improving policy accuracy and respecting domain constraints.

Contribution

It introduces a framework that embeds structural priors into Bellman updates via a proximal projection, ensuring contraction, uniqueness, and exact enforcement of constraints.

Findings

01

Eliminates monotonicity violations in synthetic auction data.

02

Outperforms conservative and implicit Q-learning in return and sample efficiency.

03

Maintains a $\gamma$-contraction and has a unique fixed point.

Abstract

Offline reinforcement learning promises policy improvement from logged interaction data alone, yet state-of-the-art algorithms remain vulnerable to value over-estimation and to violations of domain knowledge such as monotonicity or smoothness. We introduce implicit constraint-aware off-policy correction, a framework that embeds structural priors directly inside every Bellman update. The key idea is to compose the optimal Bellman operator with a proximal projection on a convex constraint set, which produces a new operator that (i) remains a $γ$ -contraction, (ii) possesses a unique fixed point, and (iii) enforces the prescribed structure exactly. A differentiable optimization layer solves the projection; implicit differentiation supplies gradients for deep function approximators at a cost comparable to implicit Q-learning. On a synthetic Bid-Click auction -- where the true value is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsQ-Learning