# Evaluating the appropriateness and safety of generative AI in delivering lifestyle guidance for atrial fibrillation patients

**Authors:** Masahiro Makino, Wan Jou She, Panote Siriaraya, Satoaki Matoba, Keitaro Senoo

PMC · DOI: 10.1038/s41598-025-34079-z · 2025-12-29

## TL;DR

This study evaluates how well AI models can provide lifestyle advice for atrial fibrillation patients, comparing their performance to that of doctors.

## Contribution

The study introduces a comparative evaluation of three AI models for personalized lifestyle guidance in atrial fibrillation management.

## Key findings

- GPT-4o matched electrophysiologists in scientific consensus while showing higher empathy and helpfulness.
- DB GPT and PubMed GPT performed similarly to physicians in error rates and helpfulness but excelled in specific domains.
- Combining AI models could enhance the safety and reliability of medical AI systems for lifestyle counseling.

## Abstract

Lifestyle factors play a major role in atrial fibrillation (AF) incidence, but the effectiveness of lifestyle counseling varies among individuals. Due to limited consultation time, physicians often provide only brief guidance, leaving patients to manage changes on their own. This study assessed the clinical utility of three Large Language Models (LLMs) for delivering accurate and personalized lifestyle guidance: (1) GPT-4o, (2) a retrieval-augmented model using a curated Q&A database (DB GPT), and (3) a modular RAG model retrieving evidence from PubMed (PubMed GPT). Sixty-six questions from 16 AF patients were categorized into exercise, diet, lifestyle, and other domains. Five experienced electrophysiologists independently evaluated LLM-generated lifestyle guidance and physician-provided counseling responses using ten dimensions. GPT-4o demonstrated a comparable level of scientific consensus to electrophysiologists, while achieving a lower error rate and significantly higher levels of specialized content, empathy, and helpfulness. DB GPT and PubMed GPT showed similar error rates, proportions of specialized content, empathy, and helpfulness compared to electrophysiologists, but exhibited strengths in specialized content in exercise-related and accuracy in diet-related dimensions. These findings suggest that integrating complementary model strengths may help develop safer and more reliable medical AI systems.

The online version contains supplementary material available at 10.1038/s41598-025-34079-z.

## Linked entities

- **Diseases:** atrial fibrillation (MONDO:0004981)

## Full-text entities

- **Diseases:** AF (MESH:D001281)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12855944/full.md

---
Source: https://tomesphere.com/paper/PMC12855944