# AI-Generated Exercise Prescriptions for At-Risk Populations: Safety and Feasibility of a Large Language Model Assessed by Expert Evaluation

**Authors:** Minkyung Choi, Jaeyong Park, Myeounggon Lee, Jaewon Beom, Se Young Jung, Kihyuk Lee

PMC · DOI: 10.3390/jcm15062457 · 2026-03-23

## TL;DR

This study explores whether AI can safely create exercise plans for at-risk people, finding that while AI shows promise, expert input is still crucial.

## Contribution

The study evaluates the safety and feasibility of AI-generated exercise prescriptions under expert supervision for complex clinical cases.

## Key findings

- AI-generated exercise prescriptions showed structural completeness but lacked consistent expert agreement.
- Prompt structuring improved safety and guideline alignment scores but did not consistently enhance other aspects.
- Expert internal consistency was high, but inter-expert agreement was low, highlighting the subjective nature of exercise prescriptions.

## Abstract

Background/Objectives: In exercise science and sports medicine, the potential use of large language models for generating personalized exercise programs is being explored. However, the practical applicability of AI-generated exercise prescriptions has not yet been sufficiently validated, particularly in complex clinical contexts. This study aimed to evaluate their practical utility under expert supervision. Methods: Exercise prescription outputs generated by a large language model (Gemini 2.5, Google LLC) were analyzed using clinical cases incorporating complex exercise-related considerations. Three levels of prompt structuring were applied. Experts evaluated the outputs using a structured rubric assessing safety, feasibility, guideline alignment, and personalization. Inter-expert agreement was assessed using intraclass correlation coefficients (ICC), and expert-specific internal consistency was evaluated using Cronbach’s alpha. Results: AI-generated exercise prescriptions demonstrated a certain level of structural completeness. However, inter-expert agreement was low (ICC (2,3) = 0.139), whereas expert-specific internal consistency was high (Cronbach’s alpha > 0.92). Prompt structuring from Stage 1 to Stage 2 was associated with improved mean scores in safety and guideline alignment. Additional structuring did not consistently yield further improvements. Conclusions: AI-generated exercise prescriptions may have practical potential as supportive decision-making tools when expert involvement is assumed. Nonetheless, expert judgments did not converge toward a single evaluative standard, reflecting the inherently expert-dependent nature of exercise prescription.

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13026971/full.md

---
Source: https://tomesphere.com/paper/PMC13026971