APRIL: Annotations for Policy evaluation with Reliable Inference from LLMs
Aishwarya Mandyam, Kalyani Limaye, Barbara E. Engelhardt, Emily Alsentzer

TL;DR
This paper introduces APRIL, a method leveraging large language models to generate counterfactual annotations for off-policy evaluation in healthcare, improving safety and scalability in policy deployment.
Contribution
It proposes a novel approach using LLMs to produce counterfactual annotations guided by domain knowledge, enhancing OPE in high-stakes medical applications.
Findings
LLMs achieve comparable performance in predicting clinical features.
LLM-based counterfactual annotations improve OPE estimates.
The method helps identify when additional annotations are no longer beneficial.
Abstract
Off-policy evaluation (OPE) estimates the value of a contextual bandit policy prior to deployment. As such, OPE plays a critical role in ensuring safety in high-stakes domains such as healthcare. However, standard OPE approaches are limited by the size and coverage of the behavior dataset. While previous work has explored using expert-labeled counterfactual annotations to enhance dataset coverage, obtaining such annotations is expensive, limiting the scalability of prior approaches. We propose leveraging large language models (LLMs) to generate counterfactual annotations for OPE in medical domains. Our method uses domain knowledge to guide LLMs in predicting how key clinical features evolve under alternate treatments. These predicted features can then be transformed using known reward functions to create counterfactual annotations. We first evaluate the ability of several LLMs to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Mental Health Interventions · Artificial Intelligence in Healthcare and Education · Advanced Bandit Algorithms Research
