Off-Policy Evaluation Using Information Borrowing and Context-Based   Switching

Sutanoy Dasgupta; Yabo Niu; Kishan Panaganti; Dileep Kalathil; Debdeep; Pati; and Bani Mallick

arXiv:2112.09865·stat.ML·August 20, 2024

Off-Policy Evaluation Using Information Borrowing and Context-Based Switching

Sutanoy Dasgupta, Yabo Niu, Kishan Panaganti, Dileep Kalathil, Debdeep, Pati, and Bani Mallick

PDF

Open Access 1 Repo

TL;DR

This paper introduces the DR-IC estimator for off-policy evaluation in contextual bandits, which reduces bias and variance by using information borrowing and context-based switching, outperforming existing methods.

Contribution

The paper proposes a novel DR-IC estimator that combines a parametric reward model with adaptive context-based switching to improve off-policy evaluation accuracy.

Findings

01

DR-IC reduces bias and variance in OPE.

02

DR-IC outperforms state-of-the-art algorithms on benchmarks.

03

Theoretical guarantees support DR-IC's effectiveness.

Abstract

We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy. Most popular approaches to the OPE are variants of the doubly robust (DR) estimator obtained by combining a direct method (DM) estimator and a correction term involving the inverse propensity score (IPS). Existing algorithms primarily focus on strategies to reduce the variance of the DR estimator arising from large IPS. We propose a new approach called the Doubly Robust with Information borrowing and Context-based switching (DR-IC) estimator that focuses on reducing both bias and variance. The DR-IC estimator replaces the standard DM estimator with a parametric reward model that borrows information from the 'closer' contexts through a correlation structure that depends on the IPS. The DR-IC estimator also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kishanpb/offpolicyevaluation_informationborrowing_contextswitching
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Advanced Causal Inference Techniques