# Evaluating Large Language Model–Generated Clinical Summaries Through a Dual-Perspective Framework: Retrospective Observational Study

**Authors:** Brian Han, Traci Barnes, Charitha D Reddy, Andrew Y Shin

PMC · DOI: 10.2196/85221 · JMIR AI · 2026-02-10

## TL;DR

This study evaluates how well GPT-4o mini generates clinical summaries for parents and doctors, showing differences in what each group finds helpful.

## Contribution

The study introduces a dual-perspective framework to assess LLM-generated clinical summaries from both clinician and parent viewpoints.

## Key findings

- Parents and clinicians differed in their ratings of summary helpfulness.
- Clinicians emphasized clinical accuracy, while parents prioritized readability.
- The study advocates for balanced frameworks that consider both clinical precision and patient understanding.

## Abstract

Large language models (LLMs) are increasingly used by patients and families to interpret complex medical documentation, yet most evaluations focus only on clinician-judged accuracy. In this study, 50 pediatric cardiac intensive care unit notes were summarized using GPT-4o mini and reviewed by both physicians and parents, who rated readability, clinical fidelity, and helpfulness. There were important discrepancies between parents and clinicians in the realm of helpfulness, along with important insights by clinicians assessing clinical accuracy and parents assessing readability. This study highlights the need for dual-perspective frameworks that balance clinical precision with patient understanding.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12933168/full.md

## References

8 references — full list in the complete paper: https://tomesphere.com/paper/PMC12933168/full.md

---
Source: https://tomesphere.com/paper/PMC12933168