# Development and internal validation of a prescriptive multi-task learning model for horizontal strabismus surgery planning

**Authors:** Jieyue Wang, Xiaoying Wu, Sheng Ou

PMC · DOI: 10.1186/s12886-026-04628-9 · 2026-01-21

## TL;DR

A new AI model accurately predicts which eye muscles to operate on and the exact surgical dose for horizontal strabismus, potentially improving surgery outcomes.

## Contribution

A novel prescriptive multi-task learning model that jointly predicts muscle selection and surgical dose for horizontal strabismus surgery.

## Key findings

- The model achieved a macro-AUC of 0.97 and macro-MCC of 0.83 for muscle selection.
- Surgical dose predictions were highly accurate with a MAE of 0.42 mm and 95% of estimates within ±0.30 mm of the surgeon’s plan.
- Exact match of the entire surgical plan reached 55%, surpassing earlier methods.

## Abstract

Horizontal strabismus affects ≈ 1.9% of the global population. Traditional “1 mm ≈ 2 Δ” nomograms disregard patient heterogeneity, leaving re-operation rates at 7–8% even after primary horizontal surgery. We aimed to develop a single prescriptive model that simultaneously predicts which horizontal extra-ocular muscles require surgery and the precise recession/resection dose for each, following the TRIPOD + AI reporting checklist.

In this retrospective single-centre study, 634 consecutive patients (2019–2024) undergoing primary horizontal-muscle surgery were analysed. Fourteen routinely recorded pre-operative variables—including age, prism-cover deviation, axial-length metrics, refractive error and visual acuity—fed a fully connected multi-task neural network with a shared trunk and two heads: (i) 8-label classification for muscle-procedure selection and (ii) 8-output regression for surgical dose. Model development exceeded recommended sample-size heuristics for an expected AUC ≥ 0.90 and was internally validated with multilabel-stratified 10-fold cross-validation.

The model achieved excellent discrimination for muscle selection (macro-AUC 0.97 ± 0.01; macro-MCC 0.83) with near-perfect calibration (ECE 0.008). Dose predictions were highly accurate (MAE 0.42 ± 0.04 mm; RMSE 0.54 ± 0.07 mm; R² 0.86 ± 0.04); 95% of estimates lay within ± 0.30 mm of the surgeon’s plan. Exact match of the entire surgical plan reached 55%, far surpassing the majority baseline of 17%. These figures markedly outperform earlier regression-only approaches that reported MAE 0.5–0.8 mm and indication-level AUC 0.82.

A transparent multi-task learning model can replicate expert, patient-specific surgical plans for horizontal strabismus with sub-millimetre precision. The tool could standardise planning and reduce inter-surgeon variability; multi-centre external validation remains essential.

The online version contains supplementary material available at 10.1186/s12886-026-04628-9.

Strengths of this study

Limitations of this study

First integrated prescriptive model that jointly predicts which horizontal extra-ocular muscles to operate and their recession/resection dose, outperforming earlier regression-only SVM and decision-tree approaches (0.97 macro-AUC vs. 0.82; 0.42 mm MAE vs. 0.5–0.8 mm).Robust, adequately powered dataset of 634 consecutive cases; exceeds published sample-size guidance for developing an AUC ≥ 0.90 model and maintains a favourable events-per-parameter ratio.Methodological rigour: multilabel-stratified 10-fold cross-validation, bootstrap CIs, and probability calibration executed in line with TRIPOD + AI reporting standards, enhancing transparency and reproducibility.Ground-truth surgical plans were associated with uniformly positive outcomes, ensuring that the model learned from high-quality examples.High clinical fidelity: 55% exact plan match and 95% of dose errors ≤ ± 0.30 mm—well below the 0.5 mm threshold deemed clinically perceptible—suggest potential to standardise surgical planning.

First integrated prescriptive model that jointly predicts which horizontal extra-ocular muscles to operate and their recession/resection dose, outperforming earlier regression-only SVM and decision-tree approaches (0.97 macro-AUC vs. 0.82; 0.42 mm MAE vs. 0.5–0.8 mm).

Robust, adequately powered dataset of 634 consecutive cases; exceeds published sample-size guidance for developing an AUC ≥ 0.90 model and maintains a favourable events-per-parameter ratio.

Methodological rigour: multilabel-stratified 10-fold cross-validation, bootstrap CIs, and probability calibration executed in line with TRIPOD + AI reporting standards, enhancing transparency and reproducibility.

Ground-truth surgical plans were associated with uniformly positive outcomes, ensuring that the model learned from high-quality examples.

High clinical fidelity: 55% exact plan match and 95% of dose errors ≤ ± 0.30 mm—well below the 0.5 mm threshold deemed clinically perceptible—suggest potential to standardise surgical planning.

Single-centre, retrospective design may encode local practice patterns; optimism bias is possible without external validation in diverse settings.Class imbalance effects: rare procedures (e.g., left lateral-rectus resection) have fewer training examples and slightly lower accuracy despite stratified sampling.

Single-centre, retrospective design may encode local practice patterns; optimism bias is possible without external validation in diverse settings.

Class imbalance effects: rare procedures (e.g., left lateral-rectus resection) have fewer training examples and slightly lower accuracy despite stratified sampling.

The online version contains supplementary material available at 10.1186/s12886-026-04628-9.

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** Horizontal strabismus (MESH:D013285), atrophy (MESH:D001284), misalignment (MESH:D017760), anisometropia (MESH:D015858), esotropia (MESH:D004948), Exotropia (MESH:D005099)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12828921/full.md

---
Source: https://tomesphere.com/paper/PMC12828921