# Improving medication error classification using a reasoning large language model

**Authors:** Anders Krifors, Theodor Beskow, Magnus Jonsson, Karl-Johan Lindner, Jenny Calås, Veronica Arwsbo, Kristian Sandström, Christer Norström

PMC · DOI: 10.1093/jamiaopen/ooag004 · JAMIA Open · 2026-01-24

## TL;DR

A large language model was adapted to identify medication errors in medical reports with high accuracy, matching expert classifications and potentially improving patient safety monitoring.

## Contribution

A reasoning large language model was adapted and validated for medication error classification in medical incident reports with expert-level performance.

## Key findings

- The LLM achieved 96.0% concordance with expert classifications on 200 incident reports.
- Disagreements were mainly due to linguistic ambiguity or context-dependent interpretation.
- Subcategorization accuracy was 76.5%, outperforming existing automated methods.

## Abstract

To assess the performance of a reasoning large language model (LLM) in identifying medication errors in medical incident reports.

OpenAI’s O4-mini LLM was adapted using prompt engineering on 75 000 anonymized incident reports from the Västmanland region of Sweden (2019-2024). To guide the prompt design, we used a subset of 2434 reports, which were manually reclassified by pharmacists as medication-related or not. For validation, 200 reports (January 2024-March 2024) were independently classified by 2 pharmacists to establish a reference classification. Moreover, the LLM performed binary classification, with concordance rates measured against the expert consensus.

The LLM achieved a concordance rate of 96.0% (192/200; 95% CI, 92.3-98.3) with expert classification. Eight cases (4.0%) showed disagreements, primarily due to linguistic ambiguity or context-dependent interpretation. Five cases involved pharmacists classifying reports as non-medication-related, while the LLM classified them as medication-related, with the reverse in 3 cases. Subcategorization accuracy was 76.5%.

The LLM showed expert-level performance, outperforming existing automated methods. Thus, its integration into incident reporting systems might improve the efficiency, accuracy, and consistency of patient safety monitoring.

This validated AI-driven method can be integrated directly into clinical informatics workflows, enabling healthcare organizations to rapidly and consistently identify medication errors, ultimately enhancing patient safety outcomes.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12832951/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12832951/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12832951/full.md

---
Source: https://tomesphere.com/paper/PMC12832951