DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in   Bilingual Complex Ophthalmology Reasoning

Pusheng Xu; Yue Wu; Kai Jin; Xiaolan Chen; Mingguang He; Danli Shi

arXiv:2502.17947·cs.CL·February 26, 2025

DeepSeek-R1 Outperforms Gemini 2.0 Pro, OpenAI o1, and o3-mini in Bilingual Complex Ophthalmology Reasoning

Pusheng Xu, Yue Wu, Kai Jin, Xiaolan Chen, Mingguang He, Danli Shi

PDF

Open Access

TL;DR

DeepSeek-R1 outperforms other large language models in bilingual complex ophthalmology reasoning, demonstrating higher accuracy and reasoning ability, indicating its potential to support clinical diagnosis and decision-making.

Contribution

This study introduces DeepSeek-R1, a new LLM that surpasses existing models in bilingual ophthalmology reasoning tasks, with comprehensive evaluation on Chinese and English MCQs.

Findings

01

DeepSeek-R1 achieved the highest accuracy in both Chinese and English MCQs.

02

DeepSeek-R1 outperformed Gemini 2.0 Pro, OpenAI o1, and o3-mini significantly.

03

Common reasoning errors include ignoring key history and misinterpreting data.

Abstract

Purpose: To evaluate the accuracy and reasoning ability of DeepSeek-R1 and three other recently released large language models (LLMs) in bilingual complex ophthalmology cases. Methods: A total of 130 multiple-choice questions (MCQs) related to diagnosis (n = 39) and management (n = 91) were collected from the Chinese ophthalmology senior professional title examination and categorized into six topics. These MCQs were translated into English using DeepSeek-R1. The responses of DeepSeek-R1, Gemini 2.0 Pro, OpenAI o1 and o3-mini were generated under default configurations between February 15 and February 20, 2025. Accuracy was calculated as the proportion of correctly answered questions, with omissions and extra answers considered incorrect. Reasoning ability was evaluated through analyzing reasoning logic and the causes of reasoning error. Results: DeepSeek-R1 demonstrated the highest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Retinal Imaging and Analysis · Medical Imaging and Analysis