A Simpler Alternative to Variational Regularized Counterfactual Risk   Minimization

Hua Chang Bakker; Shashank Gupta; Harrie Oosterhuis

arXiv:2409.09819·cs.LG·October 15, 2024

A Simpler Alternative to Variational Regularized Counterfactual Risk Minimization

Hua Chang Bakker, Shashank Gupta, Harrie Oosterhuis

PDF

Open Access

TL;DR

This paper proposes a simpler and more effective method for off-policy learning by directly minimizing f-divergence, improving upon the previous variational regularized approach.

Contribution

It introduces a novel direct approximation method for f-divergence minimization, replacing the more complex f-GAN based lower bound approach.

Findings

01

Direct divergence minimization outperforms f-GAN based methods

02

Reproducing original VRCRM results was unsuccessful

03

Proposed method shows empirical improvements in experiments

Abstract

Variance regularized counterfactual risk minimization (VRCRM) has been proposed as an alternative off-policy learning (OPL) method. VRCRM method uses a lower-bound on the $f$ -divergence between the logging policy and the target policy as regularization during learning and was shown to improve performance over existing OPL alternatives on multi-label classification tasks. In this work, we revisit the original experimental setting of VRCRM and propose to minimize the $f$ -divergence directly, instead of optimizing for the lower bound using a $f$ -GAN approach. Surprisingly, we were unable to reproduce the results reported in the original setting. In response, we propose a novel simpler alternative to f-divergence optimization by minimizing a direct approximation of f-divergence directly, instead of a $f$ -GAN based lower bound. Experiments showed that minimizing the divergence using $f$ -GANs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization