# Leveraging ChatGPT for thematic analysis of medical best practice advisory data

**Authors:** Yejin Jeong, Margaret Smith, Robert J Gallo, Lisa Marie Knowlton, Steven Lin, Lisa Shieh

PMC · DOI: 10.1093/jamiaopen/ooaf126 · JAMIA Open · 2025-10-27

## TL;DR

This paper shows how ChatGPT can help analyze medical text data when guided with specific prompt strategies.

## Contribution

The study introduces a structured prompt engineering approach to optimize ChatGPT for clinical thematic analysis.

## Key findings

- ChatGPT achieved substantial agreement with human coding (κ = 0.76 and 0.78 for two categories).
- Inductive analysis revealed 9 themes closely aligned with human coding.
- Prompt engineering strategies like role specification and calibration improved performance.

## Abstract

To evaluate ChatGPT’s ability to perform thematic analysis of medical Best Practice Advisory (BPA) free-text comments and identify prompt engineering strategies that optimize performance.

We analyzed 778 BPA comments from a pilot AI-enabled clinical deterioration intervention at Stanford Hospital, categorized as reasons for deterioration (Category 1) and care team actions (Category 2). Prompt engineering strategies (role, context specification, stepwise instructions, few-shot prompting, and dialogue-based calibration) were tested on a 20% random subsample to determine the best-performing prompt. Using that prompt, ChatGPT conducted deductive coding on the full dataset followed by inductive analysis. Agreement with human coding was assessed as inter-rater reliability (IRR) using Cohen’s Kappa (κ).

With structured prompts and calibration, ChatGPT achieved substantial agreement with human coding (κ = 0.76 for Category 1; κ = 0.78 for Category 2). Baseline agreement was higher for Category 1 than Category 2, reflecting differences in comment type and complexity, but calibration improved both. Inductive analysis yielded 9 themes, with ChatGPT-generated themes closely aligning with human coding.

ChatGPT can accelerate qualitative analysis, but its rigor depends heavily on prompt engineering. Key strategies included role and context specification, pulse-check calibration, and safeguard techniques, which enhanced reliability and reproducibility.

This study demonstrates the feasibility of ChatGPT-assisted thematic analysis and introduces a structured approach for applying LLMs to qualitative analysis of clinical free-text data, underscoring prompt engineering as a methodological lever.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12757007/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12757007/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/PMC12757007/full.md

---
Source: https://tomesphere.com/paper/PMC12757007