Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

Imad Aouali; Victor-Emmanuel Brunel; David Rohde; Anna Korba

arXiv:2402.14664·cs.LG·April 10, 2025·1 cites

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces

Imad Aouali, Victor-Emmanuel Brunel, David Rohde, Anna Korba

PDF

Open Access

TL;DR

This paper introduces a Bayesian framework for off-policy evaluation and learning in large action spaces, effectively capturing action correlations to improve sample efficiency and performance.

Contribution

It proposes a unified Bayesian approach, sDM, that leverages action correlations without sacrificing computational efficiency, and introduces Bayesian metrics for algorithm assessment.

Findings

01

sDM outperforms existing methods in empirical tests

02

Leveraging action correlations improves evaluation accuracy

03

Bayesian metrics provide better performance insights

Abstract

In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Context-Aware Activity Recognition Systems