# Context-Action Embedding Learning for Off-Policy Evaluation in Contextual Bandits

**Authors:** Kushagra Chandak, Vincent Liu, Haanvid Lee

arXiv: 2509.00648 · 2025-10-15

## TL;DR

This paper introduces CAEL-MIPS, a novel method that learns context-action embeddings to minimize mean squared error in off-policy evaluation for contextual bandits, improving estimation accuracy especially in large or sparse action spaces.

## Contribution

It proposes a new embedding learning approach that optimizes for MSE reduction in MIPS estimators, addressing limitations of previous methods that did not consider context or MSE minimization.

## Key findings

- CAEL-MIPS outperforms baseline methods in MSE on synthetic data.
- The approach effectively leverages context-action information to improve off-policy evaluation.
- Empirical results demonstrate the method's superiority in real-world datasets.

## Abstract

We consider off-policy evaluation (OPE) in contextual bandits with finite action space. Inverse Propensity Score (IPS) weighting is a widely used method for OPE due to its unbiased, but it suffers from significant variance when the action space is large or when some parts of the context-action space are underexplored. Recently introduced Marginalized IPS (MIPS) estimators mitigate this issue by leveraging action embeddings. However, these embeddings do not minimize the mean squared error (MSE) of the estimators and do not consider context information. To address these limitations, we introduce Context-Action Embedding Learning for MIPS, or CAEL-MIPS, which learns context-action embeddings from offline data to minimize the MSE of the MIPS estimator. Building on the theoretical analysis of bias and variance of MIPS, we present an MSE-minimizing objective for CAEL-MIPS. In the empirical studies on a synthetic dataset and a real-world dataset, we demonstrate that our estimator outperforms baselines in terms of MSE.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00648/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00648/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/2509.00648/full.md

---
Source: https://tomesphere.com/paper/2509.00648