Study of comparative performance of general-purpose LLM-based systems in predicting IVF outcomes
Can Dinç, Ömer Faruk Öz, Saltuk Buğra Arıkan, Selen Doğan, Murat Özekinci, Nasuh Utku Doğan, İnanç Mendilcioğlu

TL;DR
This study compares how well general AI models can predict outcomes of IVF treatments, finding that they are not yet reliable enough for clinical use.
Contribution
The first comparative evaluation of general-purpose LLMs for IVF outcome prediction using standardized clinical vignettes.
Findings
Gemini performed best in predicting stimulation protocols and embryo counts, but all models showed suboptimal accuracy.
Clinical pregnancy prediction had the lowest performance, with Gemini achieving the highest AUC of 0.711.
No model reached sufficient reliability for clinical use, highlighting the need for further validation and task-specific development.
Abstract
Artificial intelligence (AI) has emerged as a promising tool for clinical decision support in reproductive medicine, yet the performance of general-purpose large language models (LLMs) in predicting in vitro fertilization (IVF) outcomes remains insufficiently characterized. This exploratory proof-of-concept study aimed to evaluate and compare the out-of-the-box performance of three widely accessible LLM-based systems (ChatGPT, DeepSeek, and Gemini) in forecasting key clinical and laboratory outcomes of IVF treatments. This retrospective single-center study used data from 1473 autologous IVF/ICSI cycles, each representing a unique patient. For each cycle, relevant clinical and laboratory variables were incorporated into a standardized anonymized patient-level vignette and submitted via the publicly available web interfaces of three LLMs (ChatGPT, DeepSeek, Gemini) without any…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOvarian function and disorders · Reproductive Biology and Fertility · Reproductive Health and Technologies
