# Enhancing Preoperative Orthopaedic Communication: A Comparative Analysis of Large Language Model- and Clinician-Generated Clinic Letters

**Authors:** Wilfred C Saunders, Alexander C Glendenning, Charles Gamble, Richard Roberts

PMC · DOI: 10.7759/cureus.101413 · 2026-01-13

## TL;DR

This study compares clinic letters generated by large language models and clinicians, finding that LLMs produce more readable and understandable content while including essential medical information.

## Contribution

The study demonstrates that large language models can enhance preoperative communication by generating more readable and informative clinic letters than clinicians.

## Key findings

- LLM-generated letters had higher understandability scores and simpler readability levels compared to clinician letters.
- OpenAI o1 achieved the highest complication profile compliance among all tested models.
- LLMs can reduce administrative burdens and improve patient-centered decision-making in orthopaedic practice.

## Abstract

Background

Clear, effective communication is fundamental to orthopaedic practice, particularly when securing informed consent. Escalating NHS workforce and time constraints necessitate tools that streamline, yet enhance, patient‑clinician dialogue. By analysing understandability, readability, and complication profile inclusion, this study aims to determine the feasibility of large language model (LLM)‑assisted correspondence to support equitable, patient‑centred consent and decision‑making.

Methods

Six frequently performed orthopaedic operations were chosen. Standardised, clinic‑friendly prompts were fed to four LLMs: OpenAI o1, DeepSeek, Gemini, and Copilot, each producing two letters per procedure. An identical prompt was provided to two clinicians to produce letters for the same operation, serving as a human benchmark. Understandability (Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P)), readability (Flesch-Kincaid readability tests, Gunning Fog Index, and Simple Measure of Gobbledygook (SMOG) indices), and gold-standard complication inclusion were recorded.

Results

PEMAT-P understandability scores for each LLM were as follows: OpenAI o1 0.72 (±0.07), DeepSeek 0.81 (±0.09), Copilot 0.81 (±0.08), Gemini 0.83 (±0.05). Human letters scored 0.72 (±0.03). All LLMs produced text at a seventh-eighth grade level; Flesch‑Kincaid 6.850-8.517, markedly simpler than human letters (10.6 ± 0.94). OpenAI o1’s outputs were easiest to read according to the Gunning-Fog and SMOG scales (8.8833 ± 0.5702 and 9.9833 ± 0.4569), whereas clinician letters were harder (14.1333 ± 1.1 and 13.3333 ± 0.55).  OpenAI o1 achieved the greatest complication profile compliance (0.923 ± 0.104, P < 0.001), followed by Gemini (0.860 ± 0.079).

Conclusion

LLMs can outperform traditional clinician correspondence in readability and understandability, while simultaneously incorporating gold‑standard complication profiles into clinic letters. Embedding optimised, LLM workflows within outpatient practice could markedly reduce administrative burden, minimise transcription delays, and empower patients to make better‑informed, shared decisions. Future research must refine LLM search capability, evaluate cost‑effectiveness, ensure ethical and medico‑legal oversight, integrate outputs with electronic health records, and establish rigorously validated pathways for safe clinical deployment.

## Full-text entities

- **Diseases:** complication (MESH:D008107)
- **Chemicals:** Gemini (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12895212/full.md

---
Source: https://tomesphere.com/paper/PMC12895212