# AI-generated documentation of psychiatric interviews: a proof-of-concept study

**Authors:** Bengican Gülegen, Raoul Haaf, Emanuel Schlüßler, Stephan Köhler

PMC · DOI: 10.3389/fpsyt.2026.1621532 · Frontiers in Psychiatry · 2026-02-11

## TL;DR

This study shows AI can help with psychiatric interview documentation but still needs improvement to match human accuracy and avoid errors.

## Contribution

A proof-of-concept study evaluating AI's ability to transcribe and summarize psychiatric interviews with a structured codebook.

## Key findings

- AI achieved high transcription accuracy but lower accuracy in structured content compared to human reports.
- AI reports occasionally provided more detailed context but introduced clinically relevant inaccuracies.
- Human reports showed significantly higher agreement with the gold standard across all categories.

## Abstract

The documentation process in psychiatric interviews is laborious and often compromises the quality of patient care. Addressing this challenge, we explored the potential of artificial intelligence (AI) to automate documentation tasks and improve efficiency in psychiatric practice.

Six simulated psychiatric interviews were transcribed and summarized using an AI model and compared to a gold standard, together with reports written by humans. Reports were decomposed into binary items using a predefined codebook covering patient information, current complaints, psychiatric history, medical history, medication, substance use, social history, family history, vegetative symptoms, psychopathology, and preliminary diagnoses. Transcription accuracy, performance, and inter-rater reliability were evaluated.

The AI achieved a high transcription accuracy with a mean word error rate of 9.44% and a Levenshtein score of 0.996, aligning with current voice-to-text transcription standards. Inter-rater reliability was high overall. The mean Cohen’s κ was 0.80 (SD = 0.33), the mean percent agreement was 0.96 (SD = 0.07), and the mean Gwet’s AC1 was 0.93 (SD = 0.12). Across all categories, human reports showed substantially higher agreement with the gold standard than AI reports. The mean accuracy was 0.94 (SD = 0.01) for human reports and 0.78 (SD = 0.08) for AI reports, t(5) = 6.33, p = .003. The mean F1 scores were also higher for human reports (M = 0.89, SD = 0.02) than for AI reports (M = 0.55, SD = 0.13), t(5) = 7.38, p = .001. Occasionally, AI reports provided more detailed contextual information than human reports. However, AI reports also introduced clinically relevant inaccuracies and struggled in complex domains such as psychopathology.

While our findings suggest promising prospects for AI-driven documentation in psychiatry, further development is essential to enhance the model’s ability to comprehensively assess and document psychopathological features. Importantly, some AI-generated inaccuracies were clinically significant, underscoring the necessity of a final clinical review by a qualified professional. These findings are limited by the very small number of highly controlled simulated interviews. Larger studies with real patients, diverse clinicians, and routine clinical workflows will be required. Nonetheless, AI-supported documentation has the potential to considerably reduce time demands and alleviate the documentation burden in psychiatric care.

## Full-text entities

- **Diseases:** Psychovegetative abnormalities (MESH:D000014), hallucinations (MESH:D006212), BPAD (MESH:C564108), hypothermia (MESH:D007035), burnout (MESH:D002055), psychomotor disorders (MESH:D011596), delusions (MESH:D063726), death (MESH:D003643), hypertension (MESH:D006973), ADHD (MESH:D001289), hay fever (MESH:D006255), hypertonia (MESH:D009122), weight loss (MESH:D015431), gastrointestinal side effects (MESH:D064420), OCD (MESH:D009771), DEP (MESH:D003866), SP (MESH:D000072861), stomach problems (MESH:D013272), claustrophobia (MESH:D010698), allergies (MESH:D004342), Eating disorder (MESH:D001068), amnesia (MESH:D000647), PD (MESH:D016584), Compulsive (MESH:D000073932), Impairments in concentration or memory (MESH:D008569), medication overdose (MESH:D062787), functional impairment (MESH:D003072), sleep disturbances (MESH:D012893), acrophobia (MESH:C000719188), Psychiatric (MESH:D001523), SCHIZ (MESH:D012559), weight gain (MESH:D015430), AI (MESH:C538142), leg fracture (MESH:D010264), impairment of concentration (MESH:C567712)
- **Chemicals:** Zuprexa (-), Zyprexa (MESH:D000077152), penicillin (MESH:D010406), Mirtazapine (MESH:D000078785), cocaine (MESH:D003042), nicotine (MESH:D009538), sertraline (MESH:D020280), Ramipril (MESH:D017257), alcohol (MESH:D000438)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12932583/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12932583/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC12932583/full.md

---
Source: https://tomesphere.com/paper/PMC12932583