# Advancing the science of qualitative patient preference assessment using large language models

**Authors:** Ted Grover, Emanuel Krebs, Deirdre Weymann, Morgan Ehman, Dean A. Regier

PMC · DOI: 10.1371/journal.pdig.0001263 · 2026-03-12

## TL;DR

This paper explores using large language models to analyze patient preferences from focus group transcripts, showing they can generate themes similar to those identified by humans.

## Contribution

The study introduces optimized prompt frameworks for LLMs to perform inductive thematic analysis in patient preference assessments, a novel application of LLMs in healthcare.

## Key findings

- LLMs generated themes with median Jaccard similarity coefficients of 0.46–0.64 compared to human-analyzed themes.
- The best-performing framework showed 12% higher semantic overlap with human themes than published benchmarks.
- LLMs can produce patient preference themes similar in content and style to human analysis when given sufficient domain context.

## Abstract

Patient experiences and perspectives are essential for shaping patient-centered healthcare. While large language models (LLMs) in healthcare are typically applied to specific clinical or patient-facing tasks, they have not been used for qualitative patient preference assessment, which often relies on thematic analysis to understand patient views expressed in interviews or focus groups. LLMs show initial promise for performing inductive thematic analysis of healthcare interview or focus group transcripts, yet no empirical studies have investigated LLMs to facilitate qualitative patient preference assessment. We employed the open-source Hermes-3-Llama-3.1-70B LLM to perform inductive thematic analysis on focus group transcripts from a previously published qualitative patient preference assessment study using three optimized prompt frameworks, and evaluated semantic similarity of LLM generated themes against human-analyzed themes using the Sentence-T5-XXL language embedding model. Sentence-level theme similarity was assessed using Jaccard similarity coefficients (0–1 range), computing coefficient scores across a broad range of discrete cosine similarity thresholds. We further evaluated LLM themes for similarity in lexical diversity and reading grade-level metrics and benchmarked semantic similarity results with published similarity thresholds previously used with qualitative healthcare data. All prompt frameworks generated themes with median Jaccard similarity coefficients with human-analyzed themes between 0.46–0.64, indicating moderate semantic overlap. Our best-performing framework instructed to pursue thematic saturation scored closest to human-analyzed themes on all reading grade-level metrics, and demonstrated 12% higher semantic overlap with human-analyzed themes compared to published benchmarks. Our worst-performing framework produced themes with moderate semantic overlap and hallucinated findings unidentified in human-analyzed themes. We demonstrate that LLMs can perform inductive thematic analysis of qualitative patient preference data, producing themes substantively similar in content and style to human-analyzed themes when augmented with sufficient domain-specific context. While LLMs may augment thematic analysis, the contextual nature of qualitative analysis remains a challenge requiring collaborative LLM frameworks integrating human expertise.

The experiences and preferences of patients provide valuable insights towards evaluating the risks and benefits of new health products, services, and technologies, and can help guide appropriate decision making along the development process. Patient interviews or focus groups are commonly used by researchers to develop a deep understanding of patient perspectives and the perceived benefits or risks of a new healthcare product, service, or technology. While this approach is effective, there is considerable manual effort and time required by researchers to uncover themes from the transcripts of these interviews or focus groups. In this study, we demonstrate that applying prompt optimization to open-source large language models can effectively and rapidly generate themes on patient preferences similar in content and style to human-analyzed themes. Our study can inform best practices for large language model use in thematic evidence generation of patient preferences to improve healthcare decision-making and accelerated patient-centered healthcare.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12981457/full.md

---
Source: https://tomesphere.com/paper/PMC12981457