Off-Policy Evaluation Using Information Borrowing and Context-Based Switching
Sutanoy Dasgupta, Yabo Niu, Kishan Panaganti, Dileep Kalathil, Debdeep, Pati, and Bani Mallick

TL;DR
This paper introduces the DR-IC estimator for off-policy evaluation in contextual bandits, which reduces bias and variance by using information borrowing and context-based switching, outperforming existing methods.
Contribution
The paper proposes a novel DR-IC estimator that combines a parametric reward model with adaptive context-based switching to improve off-policy evaluation accuracy.
Findings
DR-IC reduces bias and variance in OPE.
DR-IC outperforms state-of-the-art algorithms on benchmarks.
Theoretical guarantees support DR-IC's effectiveness.
Abstract
We consider the off-policy evaluation (OPE) problem in contextual bandits, where the goal is to estimate the value of a target policy using the data collected by a logging policy. Most popular approaches to the OPE are variants of the doubly robust (DR) estimator obtained by combining a direct method (DM) estimator and a correction term involving the inverse propensity score (IPS). Existing algorithms primarily focus on strategies to reduce the variance of the DR estimator arising from large IPS. We propose a new approach called the Doubly Robust with Information borrowing and Context-based switching (DR-IC) estimator that focuses on reducing both bias and variance. The DR-IC estimator replaces the standard DM estimator with a parametric reward model that borrows information from the 'closer' contexts through a correlation structure that depends on the IPS. The DR-IC estimator also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Advanced Causal Inference Techniques
