# Using Large Language Models for In Silico Development and Simulation of a Patient-Reported Outcome Questionnaire for Cataract Surgery with Various Intraocular Lenses: A Pre-Validation Study

**Authors:** Ewelina Trojacka, Joanna Przybek-Skrzypecka, Justyna Izdebska, Jacek P. Szaflik, Musa Aamir Qazi, Abdullah Azhar, Janusz Skrzypecki

PMC · DOI: 10.3390/jcm15010283 · Journal of Clinical Medicine · 2025-12-30

## TL;DR

This study uses large language models to create and test a questionnaire for cataract surgery outcomes, reducing the need for real patient testing.

## Contribution

A novel in silico framework using LLMs for pre-validating PROMs in ophthalmology, ensuring robustness before clinical use.

## Key findings

- The model showed excellent psychometric properties with strong structural validity and no significant bias.
- Test-retest reliability was high, and convergent validity was confirmed with existing scores.
- The framework effectively simulates realistic patient responses, reducing clinical trial burdens.

## Abstract

Background/Objectives: Development of Patient-Reported Outcome Measures (PROMs) in ophthalmology is limited by high patient burden during early validation. We propose an In Silico Pre-validation Framework using Large Language Models (LLMs) to stress-test instruments before clinical deployment. Methods: The LLM generated a PROM questionnaire and a synthetic cohort of 500 distinct patient profiles via a Python-based pipeline. Profiles were instantiated as structured JSON objects with detailed attributes for demographics, lifestyle, and health background, including specific clinical parameters like IOL type (Monofocal, Multifocal, EDOF) and dysphotopsia severity. To eliminate memory bias, a stateless simulation approach was used for test–retest reliability; AI agents were re-instantiated without access to prior conversation history. Psychometric validation included Confirmatory Factor Analysis (CFA) using WLSMV estimation and Differential Item Functioning (DIF). Results: The model demonstrated excellent fit (CFI = 0.962, TLI = 0.951, RMSEA = 0.048, SRMR = 0.063), confirming structural validity. DIF analysis detected no significant bias based on age, sex, or IOL type (0/20 items flagged). Internal consistency was robust (Cronbach’s alpha > 0.80) and stateless test–retest reliability was high (ICC > 0.90), indicating stability independent of algorithmic memory. Convergent validity was established via significant correlations with NEI-VFQ-25 scores (Spearman’s: −0.425 to −0.652,). While responsive to change, known-groups validity reflected realistic clinical overlap. Conclusions: LLM-based pre-validation effectively mirrors complex human response patterns through “algorithmic fidelity.” By identifying structural failure points in silico, this framework ensures PROMs are robust and unbiased before clinical trials, reducing the ethical and logistical burden on real-world populations.

## Linked entities

- **Diseases:** cataract (MONDO:0005129)

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12786829/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12786829/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC12786829/full.md

---
Source: https://tomesphere.com/paper/PMC12786829