Can OpenAI o1 Reason Well in Ophthalmology? A 6,990-Question Head-to-Head Evaluation Study
Sahana Srinivasan, Xuguang Ai, Minjie Zou, Ke Zou, Hyunjae Kim,, Thaddaeus Wai Soon Lo, Krithi Pushpanathan, Yiming Kong, Anran Li, Maxwell, Singer, Kai Jin, Fares Antaki, David Ziyou Chen, Dianbo Liu, Ron A. Adelman,, Qingyu Chen, Yih Chung Tham

TL;DR
This study evaluates OpenAI o1's performance and reasoning in ophthalmology questions, finding it excels in accuracy but has limitations in reasoning capabilities, highlighting the need for domain-specific improvements.
Contribution
It provides a comprehensive head-to-head comparison of OpenAI o1 with other LLMs in ophthalmology, revealing strengths and areas for improvement.
Findings
O1 achieved the highest accuracy (0.88) and macro-F1 score.
O1 ranked first in Lens and Glaucoma subtopics.
O1 performed better on queries with longer explanations.
Abstract
Question: What is the performance and reasoning ability of OpenAI o1 compared to other large language models in addressing ophthalmology-specific questions? Findings: This study evaluated OpenAI o1 and five LLMs using 6,990 ophthalmological questions from MedMCQA. O1 achieved the highest accuracy (0.88) and macro-F1 score but ranked third in reasoning capabilities based on text-generation metrics. Across subtopics, o1 ranked first in ``Lens'' and ``Glaucoma'' but second to GPT-4o in ``Corneal and External Diseases'', ``Vitreous and Retina'' and ``Oculoplastic and Orbital Diseases''. Subgroup analyses showed o1 performed better on queries with longer ground truth explanations. Meaning: O1's reasoning enhancements may not fully extend to ophthalmology, underscoring the need for domain-specific refinements to optimize performance in specialized fields like ophthalmology.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Imaging in Medicine · Artificial Intelligence in Healthcare and Education · Ophthalmology and Visual Health Research
