DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning
Pusheng Xu, Yue Wu, Kai Jin, Xiaolan Chen, Mingguang He, Danli Shi

TL;DR
DeepSeek-R1 outperforms other large language models in bilingual complex ophthalmology reasoning, demonstrating higher accuracy and reasoning ability, indicating its potential to support clinical diagnosis and decision-making.
Contribution
This study introduces DeepSeek-R1, a new LLM that surpasses existing models in bilingual ophthalmology reasoning tasks, with comprehensive evaluation on Chinese and English MCQs.
Findings
DeepSeek-R1 achieved the highest accuracy in both Chinese and English MCQs.
DeepSeek-R1 outperformed Gemini 2.0 Pro, OpenAI o1, and o3-mini significantly.
Common reasoning errors include ignoring key history and misinterpreting data.
Abstract
Purpose: To evaluate the accuracy and reasoning ability of DeepSeek-R1 and three other recently released large language models (LLMs) in bilingual complex ophthalmology cases. Methods: A total of 130 multiple-choice questions (MCQs) related to diagnosis (n = 39) and management (n = 91) were collected from the Chinese ophthalmology senior professional title examination and categorized into six topics. These MCQs were translated into English using DeepSeek-R1. The responses of DeepSeek-R1, Gemini 2.0 Pro, OpenAI o1 and o3-mini were generated under default configurations between February 15 and February 20, 2025. Accuracy was calculated as the proportion of correctly answered questions, with omissions and extra answers considered incorrect. Reasoning ability was evaluated through analyzing reasoning logic and the causes of reasoning error. Results: DeepSeek-R1 demonstrated the highest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Retinal Imaging and Analysis · Medical Imaging and Analysis
