# Human vs. artificial intelligence: Physicians outperform ChatGPT in real‐world pharmacotherapy counselling

**Authors:** Benjamin Krichevsky, Stefan Engeli, Stefanie M. Bode‐Böger, Felix Koop, Martin Schulze Westhoff, Sebastian Schröder, Carsten Schumacher, Thorben Pape, Dirk O. Stichtenoth, Johannes Heck

PMC · DOI: 10.1002/bcp.70321 · British Journal of Clinical Pharmacology · 2025-10-25

## TL;DR

This study found that physicians provide better and more accurate responses to real-world drug-related questions than the AI chatbot ChatGPT.

## Contribution

The study empirically compares ChatGPT's performance with physicians in pharmacotherapy counseling using real-world queries.

## Key findings

- Physicians' responses were rated higher in quality and factual correctness than ChatGPT's.
- ChatGPT's answers contained more factual errors compared to those from physicians.
- ChatGPT's language quality was not consistently rated lower than physicians' by all evaluators.

## Abstract

To assess the utility of the artificial intelligence (AI) chatbot ChatGPT (openly available version 3.5) in responding to real‐world pharmacotherapeutic queries from healthcare professionals.

Three independent and blinded evaluators with different levels of medical expertise and professional experience (beginner, advanced, and expert) compared AI chatbot‐ and physician‐generated responses to 70 real‐world pharmacotherapeutic queries submitted to the clinical‐pharmacological drug information centre of Hannover Medical School between June and October 2023 with regard to quality of information, answer preference, answer correctness and quality of language. Inter‐rater reliability was assessed with Krippendorff's alpha. Two separate investigators not otherwise involved in the conduct or analysis of the study selected the top three clinically relevant errors in chatbot‐ and physician‐generated responses.

All three evaluators rated the quality of information of physician‐generated responses higher than the quality of information of AI chatbot‐generated responses and, accordingly, thought that the physician‐generated responses were better than the chatbot‐generated responses (answer preference). All evaluators detected factually wrong information more frequently in chatbot‐generated responses than in physician‐generated responses. Although the beginner and expert evaluators rated the quality of language of physician‐generated responses higher than the quality of language of chatbot‐generated responses, there was no significant difference according to the advanced evaluator.

ChatGPT's responses to real‐world pharmacotherapeutic queries were substantially inferior compared to conventional physician‐generated responses with regard to quality of information and factual correctness. Our study suggests that to date it must be strongly cautioned against the use of ChatGPT in pharmacotherapy counselling.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12930016/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/PMC12930016/full.md

---
Source: https://tomesphere.com/paper/PMC12930016