# Comparative Analysis of Japanese Clinical Note Styles Between Physicians and Large Language Models Using Identical Psychiatric Cases: Quantitative Text Analysis

**Authors:** Wataru Arihisa, Tomohiro Nishiyama, Shoko Wakamiya, Eiji Aramaki

PMC · DOI: 10.2196/85671 · 2026-03-27

## TL;DR

This paper compares how Japanese doctors and AI models write psychiatric notes, finding that AI notes are repetitive and lack the nuanced style of human experts.

## Contribution

The study introduces a systematic comparison of narrative styles between physicians and LLMs in Japanese psychiatric documentation.

## Key findings

- LLM-generated notes were longer, more repetitive, and less lexically diverse than human-authored notes.
- LLM notes showed a uniform, template-like style, unlike the flexible style of physicians.
- LLMs relied on abstract expressions and showed less variation in clinical information emphasis.

## Abstract

With the rapid adoption of large language models (LLMs) in clinical documentation, it is unclear whether LLMs can faithfully reproduce specialty-specific writing styles and clinically meaningful documentation patterns observed in expert notes, particularly in psychiatry.

This study aims to systematically compare the narrative styles of human physicians and LLMs when documenting identical psychiatric cases and to evaluate the extent to which LLMs replicate specialty-specific documentation patterns.

We constructed 2 standardized outpatient scenarios in Japanese (major depressive disorder and schizophrenia) and collected 134 initial notes in Japanese authored by psychiatrists and internists, alongside notes generated by 4 LLMs simulating each specialty. We conducted lexical, syntactic, semantic, and topic-level analyses using Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation–Longest Common Subsequence (ROUGE-L), BERTScore, and Translation Edit Rate (TER), complemented by redundancy metrics and medical term variation analyses.

LLM-generated notes were significantly longer, more repetitive, and lexically less diverse than human-authored notes. TER-based clustering revealed a uniform, template-like writing style in LLMs, diverging from the flexible, context-sensitive style of physicians. Topic modeling suggested that LLM-generated notes tended to rely on more abstract and generalized expressions, with less variation in the distribution and emphasis of documented clinical information.

LLMs can mimic surface-level stylistic features but fall short in reproducing nuanced, context-dependent, diagnostically relevant content typical of expert clinical documentation. Future clinical use will require careful prompt design or fine-tuning to ensure narrative depth, lexical diversity, and clinical relevance.

## Linked entities

- **Diseases:** major depressive disorder (MONDO:0002009), schizophrenia (MONDO:0005090)

## Full-text entities

- **Genes:** F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}, NINL (ninein like) [NCBI Gene 22981] {aka NLP}, TECR (trans-2,3-enoyl-CoA reductase) [NCBI Gene 9524] {aka GPSN2, MRT14, SC2, TER}
- **Diseases:** delusions (MESH:D063726), insomnia (MESH:D007319), HPI (OMIM:239500), LLMs (MESH:D007806), suicidal ideation (MESH:D001072), MDD (MESH:D003865), auditory hallucination (MESH:D006212), Schizophrenia (MESH:D012559), Depression (MESH:D003866), Psychiatric (MESH:D001523)
- **Chemicals:** risperidone (MESH:D018967), lead (MESH:D007854), BERT (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13027678/full.md

---
Source: https://tomesphere.com/paper/PMC13027678