Semi-gradient DICE for Offline Constrained Reinforcement Learning

Woosung Kim; JunHo Seo; Jongmin Lee; Byung-Jun Lee

arXiv:2506.08644·cs.LG·June 11, 2025

Semi-gradient DICE for Offline Constrained Reinforcement Learning

Woosung Kim, JunHo Seo, Jongmin Lee, Byung-Jun Lee

PDF

Open Access

TL;DR

This paper introduces a semi-gradient DICE method for offline constrained reinforcement learning, improving off-policy evaluation and policy optimization by ensuring accurate cost estimation and achieving state-of-the-art results.

Contribution

It identifies limitations of existing semi-gradient approaches and proposes a novel semi-gradient DICE method that enables reliable off-policy evaluation in offline constrained RL.

Findings

01

Achieves state-of-the-art performance on the DSRL benchmark.

02

Ensures accurate cost estimation in offline constrained RL.

03

Addresses limitations of previous semi-gradient methods.

Abstract

Stationary Distribution Correction Estimation (DICE) addresses the mismatch between the stationary distribution induced by a policy and the target distribution required for reliable off-policy evaluation (OPE) and policy optimization. DICE-based offline constrained RL particularly benefits from the flexibility of DICE, as it simultaneously maximizes return while estimating costs in offline settings. However, we have observed that recent approaches designed to enhance the offline RL performance of the DICE framework inadvertently undermine its ability to perform OPE, making them unsuitable for constrained RL scenarios. In this paper, we identify the root cause of this limitation: their reliance on a semi-gradient optimization, which solves a fundamentally different optimization problem and results in failures in cost estimation. Building on these insights, we propose a novel method to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Energy Load and Power Forecasting · Smart Grid Energy Management