# Comparative Evaluation of Diagnostic and Management Capabilities of Infiniti AI and ChatGPT-4o in Corneal Diseases

**Authors:** Abdulaziz Mohammad, Ali Bulbanat, Faisal Aljassar

PMC · DOI: 10.7759/cureus.95163 · Cureus · 2025-10-22

## TL;DR

This study compares ChatGPT-4o and Infiniti AI in diagnosing and managing corneal diseases, finding ChatGPT-4o to be more accurate.

## Contribution

The novel contribution is a direct empirical comparison of general-purpose and domain-specific AI models in ophthalmology.

## Key findings

- ChatGPT-4o outperformed Infiniti AI in diagnostic accuracy (2.37 vs 1.13).
- Management scores were higher for ChatGPT-4o (2.65 vs 1.98).
- Both models showed limitations and should be used as aids, not replacements, for expert judgment.

## Abstract

Background: Artificial intelligence (AI), particularly large language models (LLMs), is rapidly transforming medical education and clinical decision support. Ophthalmology, a specialty heavily reliant on pattern recognition, presents a promising domain for LLM integration. While general-purpose models like ChatGPT-4o have demonstrated strong performance in ophthalmic tasks, domain-specific systems such as Infiniti AI, built with a retrieval-augmented generation (RAG) framework, claim advantages by grounding responses in peer-reviewed ophthalmic literature. This study compares ChatGPT-4o (OpenAI, San Francisco, CA, USA) and Infiniti AI (Sinjab Academy, UAE) in corneal disease case scenarios.

Materials and methods: Twenty corneal cases were selected from the University of Iowa EyeRounds database, covering infectious, inflammatory, degenerative, developmental, and systemic associations. ChatGPT-4o, Infiniti AI, and a fellowship-trained cornea specialist independently evaluated each case. Diagnostic and management responses were scored against American Academy of Ophthalmology preferred practice pattern guidelines using a four-point scale (0-3). Statistical comparisons were performed using paired t-tests and Wilcoxon signed-rank tests.

Results: ChatGPT-4o significantly outperformed Infiniti AI across all categories. Diagnostic accuracy was higher for ChatGPT-4o (2.37 ± 0.81) than Infiniti AI (1.13 ± 0.71, p < 0.001, Cohen’s d = 1.35). Management scores were also superior (2.65 ± 0.65 vs 1.98 ± 0.65, p < 0.001, d = 1.37). Overall, ChatGPT-4o achieved a mean total score of 5.00 ± 1.22 compared with 3.10 ± 1.10 for Infiniti AI (p < 0.001, d = 1.75).

Conclusions: ChatGPT-4o demonstrated greater diagnostic and management accuracy than Infiniti AI in corneal disease scenarios, highlighting the current strength of general-purpose LLMs over specialized retrieval-based systems. Nonetheless, both models remain prone to hallucinations and should serve as adjuncts to, rather than replacements for, expert judgment. Further refinement of ophthalmology-specific models is warranted to improve safety and clinical utility.

## Full-text entities

- **Diseases:** hallucinations (MESH:D006212), Corneal Diseases (MESH:D003316), inflammatory (MESH:D007249)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12543008/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/PMC12543008/full.md

---
Source: https://tomesphere.com/paper/PMC12543008