Optimal Policy Learning with Observational Data in Multi-Action   Scenarios: Estimation, Risk Preference, and Potential Failures

Giovanni Cerulli

arXiv:2403.20250·stat.ML·April 1, 2024·1 cites

Optimal Policy Learning with Observational Data in Multi-Action Scenarios: Estimation, Risk Preference, and Potential Failures

Giovanni Cerulli

PDF

Open Access

TL;DR

This paper explores optimal policy learning from observational data in multi-action scenarios, focusing on estimation methods, the influence of risk attitudes, and potential pitfalls when key assumptions fail.

Contribution

It provides a comprehensive review of estimation techniques, analyzes how risk preferences affect decision-making, and discusses conditions leading to failures in data-driven policy optimization.

Findings

01

Optimal policies are affected by risk attitudes, influencing decision trade-offs.

02

Identification assumptions like overlap and unconfoundedness are critical for reliable policy learning.

03

Real data application shows how risk preferences impact average regret in multi-valued treatments.

Abstract

This paper deals with optimal policy learning (OPL) with observational data, i.e. data-driven optimal decision-making, in multi-action (or multi-arm) settings, where a finite set of decision options is available. It is organized in three parts, where I discuss respectively: estimation, risk preference, and potential failures. The first part provides a brief review of the key approaches to estimating the reward (or value) function and optimal policy within this context of analysis. Here, I delineate the identification assumptions and statistical properties related to offline optimal policy learning estimators. In the second part, I delve into the analysis of decision risk. This analysis reveals that the optimal choice can be influenced by the decision maker's attitude towards risks, specifically in terms of the trade-off between reward conditional mean and conditional variance. Here, I…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWater resources management and optimization · Advanced Causal Inference Techniques

MethodsSparse Evolutionary Training