Local Metric Learning for Off-Policy Evaluation in Contextual Bandits   with Continuous Actions

Haanvid Lee; Jongmin Lee; Yunseon Choi; Wonseok Jeon; Byung-Jun Lee,; Yung-Kyun Noh; Kee-Eung Kim

arXiv:2210.13373·cs.LG·December 29, 2022·1 cites

Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions

Haanvid Lee, Jongmin Lee, Yunseon Choi, Wonseok Jeon, Byung-Jun Lee,, Yung-Kyun Noh, Kee-Eung Kim

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a kernel-based local metric learning approach for off-policy evaluation in continuous action contextual bandits, effectively handling deterministic policies and optimizing the kernel metric to minimize mean squared error.

Contribution

It develops a novel kernel metric learning method for off-policy evaluation with continuous actions, extending prior work to vector actions and metric optimization.

Findings

01

The proposed estimator is consistent.

02

It significantly reduces mean squared error compared to baselines.

03

Effective for deterministic policies in continuous action spaces.

Abstract

We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic policies in contextual bandits with continuous action spaces. Our work is motivated by practical scenarios where the target policy needs to be deterministic due to domain requirements, such as prescription of treatment dosage and duration in medicine. Although importance sampling (IS) provides a basic principle for OPE, it is ill-posed for the deterministic target policy with continuous actions. Our main idea is to relax the target policy and pose the problem as kernel-based estimation, where we learn the kernel metric in order to minimize the overall mean squared error (MSE). We present an analytic solution for the optimal metric, based on the analysis of bias and variance. Whereas prior work has been limited to scalar action spaces or kernel bandwidth selection, our work takes a step further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haanvid/kmis
tfOfficial

Videos

Local Metric Learning for Off-Policy Evaluation in Contextual Bandits with Continuous Actions· slideslive

Taxonomy

TopicsGastroesophageal reflux and treatments · Advanced Causal Inference Techniques · Machine Learning in Healthcare