DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects

Shu Tamano

arXiv:2505.00961·stat.ML·February 3, 2026

DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects

Shu Tamano

PDF

Open Access 1 Repo

TL;DR

DOLCE is a novel method for off-policy evaluation and learning that effectively handles support violations by decomposing effects into lagged and current components, improving bias correction and estimate accuracy.

Contribution

DOLCE introduces a support-robust decomposition approach using lagged data, with a moment-based training procedure for unbiased and consistent off-policy evaluation and learning.

Findings

01

DOLCE achieves substantial improvements in evaluation accuracy.

02

It remains unbiased under idealized conditions.

03

It performs well even with support violations.

Abstract

Off-policy evaluation and learning in contextual bandits use logged interaction data to estimate and optimize the value of a target policy. Most existing methods require sufficient action overlap between the logging and target policies, and violations can bias value and policy gradient estimates. To address this issue, we propose DOLCE (Decomposing Off-policy evaluation/learning into Lagged and Current Effects), which uses only lagged contexts already stored in bandit logs to construct lag-marginalized importance weights and to decompose the objective into a support-robust lagged correction term and a current, model-based term, yielding bias cancellation when the reward-model residual is conditionally mean-zero given the lagged context and action. With multiple candidate lags, DOLCE softly aggregates lag-specific estimates, and we introduce a moment-based training procedure that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shutech2001/DOLCE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvaluation and Performance Assessment