InVitroVision: a Multi-Modal AI Model for Automated Description of Embryo Development using Natural Language

Nicklas Neu; Thomas Ebner; Jasmin Primus; Raphael Zefferer; Bernhard Schenkenfelder; Mathias Brunbauer; Florian Kromp

arXiv:2604.21061·cs.AI·April 24, 2026

InVitroVision: a Multi-Modal AI Model for Automated Description of Embryo Development using Natural Language

Nicklas Neu, Thomas Ebner, Jasmin Primus, Raphael Zefferer, Bernhard Schenkenfelder, Mathias Brunbauer, Florian Kromp

PDF

TL;DR

This paper introduces InVitroVision, a multi-modal AI model fine-tuned on limited IVF data to generate natural language descriptions of embryo development, outperforming existing models.

Contribution

It demonstrates the effectiveness of fine-tuning vision-language models on small datasets for IVF embryo description tasks.

Findings

01

InVitroVision outperforms ChatGPT 5.2 and base models in embryo description accuracy.

02

Performance improves with larger training datasets.

03

The approach enables natural language descriptions from limited IVF data.

Abstract

The application of artificial intelligence (AI) in IVF has shown promise in improving consistency and standardization of decisions, but often relies on annotated data and does not make use of the multimodal nature of IVF data. We investigated whether foundational vision-language models can be fine-tuned to predict natural language descriptions of embryo morphology and development. Using a publicly available embryo time-lapse dataset, we fine-tuned PaliGemma-2, a multi-modal vision-language model, with only 1,000 images and corresponding captions, describing embryo morphology, embryonic cell cycle and developmental stage. Our results show that the fine-tuned model, InVitroVision, outperformed a commercial model, ChatGPT 5.2, and base models in overall metrics, with performance improving with larger training datasets. This study demonstrates the potential of foundational vision-language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.