Detection of Medical Misinformation in Hemangioma Patient Education: Comparative Study of ChatGPT-4o and DeepSeek-R1 Large Language Models
Guoyong Wang, Ye Zhang, Weixin Wang, Yingjie Zhu, Wei Lu, Chaonan Wang, Hui Bi, Xiaonan Yang

TL;DR
This study compares two AI models in detecting medical misinformation about hemangiomas, finding one more accurate and reliable than the other.
Contribution
The study provides empirical evidence on the performance of ChatGPT-4o and DeepSeek-R1 in identifying medical rumors related to hemangiomas.
Findings
DeepSeek-R1 outperformed ChatGPT-4o in accuracy, precision, and recall for classifying medical information.
Expert evaluations showed DeepSeek-R1 had a significant advantage in detecting medical rumors.
Both models showed similar semantic stability in their outputs.
Abstract
This study examines the capability of large language models (LLMs) in detecting medical rumors, using hemangioma-related information as an example. It compares the performances of ChatGPT-4o and DeepSeek-R1. This study aimed to evaluate and compare the accuracy, stability, and expert-rated reliability of 2 LLMs, ChatGPT-4o and DeepSeek-R1, in classifying medical information related to hemangiomas as either “rumors” or “accurate information.” We collected 82 publicly available texts from social media platforms, medical education websites, international guidelines, and journals. Of the 82 items, 47/82 (57%) were labeled as “rumors,” and 35/82 (43%) were labeled as “accurate information.” Three vascular anomaly specialists with extensive clinical experience independently annotated the texts in a double-blinded manner, and disagreements were resolved by arbitration to ensure labeling…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
