# Evaluating Generative AI for Deprescribing: Accuracy, Safety, and Clinical Utility

**Authors:** Juliessa Pavon, Cara McDermott, Marc Pepin, William Bryan, Ivuoma Igwe, Cathleen Colon-Emeric

PMC · DOI: 10.1093/geroni/igaf122.1185 · Innovation in Aging · 2025-12-31

## TL;DR

This study evaluates how well generative AI can help with deprescribing medications for older patients, comparing AI recommendations to those from healthcare professionals.

## Contribution

The study introduces a novel evaluation of generative AI in deprescribing using the HELM criteria and real-world VA case scenarios.

## Key findings

- AI-generated deprescribing recommendations were compared to those from an interprofessional team using 100 VA case scenarios.
- The study assesses AI performance using content analysis and the HELM criteria for accuracy, uncertainty, and fairness.
- Findings aim to guide safe AI integration in deprescribing programs for older Veterans.

## Abstract

Limited geriatrics and pharmacy resources in VA Medical Centers (VAMCs) necessitate innovative strategies to enhance deprescribing efforts. Generative AI platforms, such as OpenAI, have the potential to generate deprescribing recommendations, tapering schedules, and patient education materials by synthesizing information from medical literature and drug interaction databases. When integrated into deprescribing programs, these platforms could enhance scalability and sustainability by providing real-time, context-aware decision support. However, before implementation, it is essential to assess their safety, accuracy, and potential risks, including errors, omissions, and confabulations. Using the VA LLM platform TryOpen AI 3.5, this project assesses AI-generated deprescribing recommendations compared to those generated by an interprofessional team of pharmacists, geriatricians, and nurses, using de-identified case scenarios (N = 100) from our VA deprescribing program. We assess AI performance through content analysis, identifying recurring themes (e.g., medication selection, tapering regimens, side effects, and patient education) using the HELM criteria (Holistic Evaluation of Language Models), which assesses accuracy, uncertainty, efficiency, fairness, and bias. Findings will inform safe AI integration in deprescribing programs, identifying appropriate applications and potential safety risks to enhance medication management for older Veterans.

---
Source: https://tomesphere.com/paper/PMC12763179