Evaluation of ChatGPT-Generated Medical Responses: A Systematic Review and Meta-Analysis
Qiuhong Wei, Zhengxiong Yao, Ying Cui, Bo Wei, Zhezhen Jin, and Ximing, Xu

TL;DR
This systematic review and meta-analysis assesses ChatGPT's accuracy in medical responses, highlighting its potential in healthcare but also emphasizing the need for standardized evaluation methods and better reporting practices.
Contribution
The paper provides a comprehensive summary and meta-analysis of existing studies on ChatGPT's medical performance, identifying methodological inconsistencies and guiding future research directions.
Findings
ChatGPT has an overall accuracy of 56% in medical queries.
Study heterogeneity and reporting issues limit reliability of results.
Potential for healthcare applications is promising despite current limitations.
Abstract
Large language models such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in medicine and provide direction for future research. We searched ten medical literature databases on June 15, 2023, using the keyword "ChatGPT". A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. The analysis showed that ChatGPT displayed an overall integrated accuracy of 56% (95% CI: 51%-60%, I2 = 87%) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. Moreover, many studies failed to report methodological details, including the version…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Topic Modeling
