Offline Policy Learning with Weight Clipping and Heaviside Composite Optimization

Jingren Liu; Hanzhang Qin; Junyi Liu; Mabel C. Chou; Jong-Shi Pang

arXiv:2601.12117·math.OC·January 21, 2026

Offline Policy Learning with Weight Clipping and Heaviside Composite Optimization

Jingren Liu, Hanzhang Qin, Junyi Liu, Mabel C. Chou, Jong-Shi Pang

PDF

Open Access

TL;DR

This paper introduces a novel offline policy learning method that uses weight clipping to reduce variance in policy value estimates, reformulates the optimization as a Heaviside composite problem, and employs integer programming for efficient solution.

Contribution

It develops a weight-clipping estimator for offline policy learning, reformulates the optimization as a Heaviside composite problem, and provides theoretical bounds on suboptimality.

Findings

01

Weight clipping reduces variance in policy value estimation.

02

Reformulation as Heaviside composite optimization enables efficient solving.

03

The proposed method improves policy learning performance theoretically.

Abstract

Offline policy learning aims to use historical data to learn an optimal personalized decision rule. In the standard estimate-then-optimize framework, reweighting-based methods (e.g., inverse propensity weighting or doubly robust estimators) are widely used to produce unbiased estimates of policy values. However, when the propensity scores of some treatments are small, these reweighting-based methods suffer from high variance in policy value estimation, which may mislead the downstream policy optimization and yield a learned policy with inferior value. In this paper, we systematically develop an offline policy learning algorithm based on a weight-clipping estimator that truncates small propensity scores via a clipping threshold chosen to minimize the mean squared error (MSE) in policy value estimation. Focusing on linear policies, we address the bilevel and discontinuous objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Causal Inference Techniques · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics