# Experimental Evaluation of Individualized Treatment Rules

**Authors:** Kosuke Imai, Michael Lingzhi Li

arXiv: 1905.05389 · 2021-05-06

## TL;DR

This paper introduces new metrics, PAPE and AUPEC, for evaluating individualized treatment rules using experimental data, providing exact variance estimates without modeling assumptions, applicable to complex algorithms.

## Contribution

It proposes novel evaluation metrics (PAPE and AUPEC) and their variance estimation methods for assessing ITRs, including when using the same data for estimation and evaluation.

## Key findings

- PAPE and AUPEC effectively measure ITR performance.
- Variance formulas enable precise evaluation without resampling.
- Method applicable to complex machine learning-based ITRs.

## Abstract

The increasing availability of individual-level data has led to numerous applications of individualized (or personalized) treatment rules (ITRs). Policy makers often wish to empirically evaluate ITRs and compare their relative performance before implementing them in a target population. We propose a new evaluation metric, the population average prescriptive effect (PAPE). The PAPE compares the performance of ITR with that of non-individualized treatment rule, which randomly treats the same proportion of units. Averaging the PAPE over a range of budget constraints yields our second evaluation metric, the area under the prescriptive effect curve (AUPEC). The AUPEC represents an overall performance measure for evaluation, like the area under the receiver and operating characteristic curve (AUROC) does for classification, and is a generalization of the QINI coefficient utilized in uplift modeling. We use Neyman's repeated sampling framework to estimate the PAPE and AUPEC and derive their exact finite-sample variances based on random sampling of units and random assignment of treatment. We extend our methodology to a common setting, in which the same experimental data is used to both estimate and evaluate ITRs. In this case, our variance calculation incorporates the additional uncertainty due to random splits of data used for cross-validation. The proposed evaluation metrics can be estimated without requiring modeling assumptions, asymptotic approximation, or resampling methods. As a result, it is applicable to any ITR including those based on complex machine learning algorithms. The open-source software package is available for implementing the proposed methodology.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.05389/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1905.05389/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/1905.05389/full.md

---
Source: https://tomesphere.com/paper/1905.05389