Evaluating Large Language Models in Ophthalmology

Jason Holmes; Shuyuan Ye; Yiwei Li; Shi-Nan Wu; Zhengliang Liu; Zihao; Wu; Jinyu Hu; Huan Zhao; Xi Jiang; Wei Liu; Hong Wei; Jie Zou; Tianming Liu,; Yi Shao

arXiv:2311.04933·cs.CL·November 10, 2023·6 cites

Evaluating Large Language Models in Ophthalmology

Jason Holmes, Shuyuan Ye, Yiwei Li, Shi-Nan Wu, Zhengliang Liu, Zihao, Wu, Jinyu Hu, Huan Zhao, Xi Jiang, Wei Liu, Hong Wei, Jie Zou, Tianming Liu,, Yi Shao

PDF

Open Access

TL;DR

This study evaluates the performance of three large language models in ophthalmology, finding GPT-4 performs at a level comparable to experienced physicians and could enhance medical education and clinical decision-making.

Contribution

It provides a comparative analysis of LLMs' ophthalmology knowledge against medical professionals, highlighting GPT-4's superior performance.

Findings

01

GPT-4 performs at the level of attending physicians.

02

LLMs outperform medical undergraduates.

03

GPT-4 shows higher stability and confidence.

Abstract

Purpose: The performance of three different large language models (LLMS) (GPT-3.5, GPT-4, and PaLM2) in answering ophthalmology professional questions was evaluated and compared with that of three different professional populations (medical undergraduates, medical masters, and attending physicians). Methods: A 100-item ophthalmology single-choice test was administered to three different LLMs (GPT-3.5, GPT-4, and PaLM2) and three different professional levels (medical undergraduates, medical masters, and attending physicians), respectively. The performance of LLM was comprehensively evaluated and compared with the human group in terms of average score, stability, and confidence. Results: Each LLM outperformed undergraduates in general, with GPT-3.5 and PaLM2 being slightly below the master's level, while GPT-4 showed a level comparable to that of attending physicians. In addition, GPT-4…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Clinical Reasoning and Diagnostic Skills

Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer · Linear Layer · Cosine Annealing · Dense Connections