Evaluating multiple large language models on orbital diseases
Qi-Chen Yang, Yan-Mei Zeng, Hong Wei, Cheng Chen, Qian Ling, Xiao-Yu Wang, Xu Chen, Yi Shao

TL;DR
This study evaluates how well large language models like GPT-4 perform in answering questions about orbital diseases compared to medical students and ophthalmologists.
Contribution
The study introduces a dataset of orbital disease questions and compares LLMs' performance against human experts in ophthalmology.
Findings
GPT-4 and PaLM2 showed the highest average correlation with correct answers.
GPT-4 outperformed medical students but did not match ophthalmologists' accuracy.
LLMs like GPT-4 have potential as educational tools in ophthalmology with further refinement.
Abstract
The avoidance of mistakes by humans is achieved through continuous learning, error correction, and experience accumulation. This process is known to be both time-consuming and laborious, often involving numerous detours. In order to assist humans in their learning endeavors, ChatGPT (Generative Pre-trained Transformer) has been developed as a collection of large language models (LLMs) capable of generating responses that resemble human-like answers to a wide range of problems. In this study, we sought to assess the potential of LLMs as assistants in addressing queries related to orbital diseases. To accomplish this, we gathered a dataset consisting of 100 orbital questions, along with their corresponding answers, sourced from examinations administered to ophthalmologist residents and medical students. Five language models (LLMs) were utilized for testing and comparison purposes, namely,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · COVID-19 diagnosis using AI · Retinal Imaging and Analysis
