Do segmentation metrics reflect clinical reality? A surgeon-centered evaluation in robot-assisted minimally invasive esophagectomy
Ronald de Jong, Yiping Li, Romy van Jaarsveld, Gino Kuiper, Richard van Hillegersberg, Jelle Ruurda, Josien Pluim, Marcel Breeuwer, Yasmina Al Khalil

TL;DR
This study evaluates how well AI segmentation metrics align with surgeon assessments in robot-assisted esophagectomy procedures.
Contribution
The study introduces surgeon-centered evaluation of segmentation metrics in RAMIE, revealing gaps between quantitative metrics and clinical relevance.
Findings
Overlap and temporal metrics best align with surgeon assessments of anatomical overlays.
Novice surgeons show weaker correlations and tend to rate overlays more leniently.
Qualitative feedback highlights hallucinations and instability missed by current metrics.
Abstract
Deep learning-based anatomy segmentation holds promise for improving real-time guidance in complex surgeries such as robot-assisted minimally invasive esophagectomy (RAMIE). However, the clinical relevance of commonly used metrics for evaluating segmentation quality remains unclear, as previous assessments have lacked direct input from surgeons. This study aims to assess how well quantitative segmentation metrics reflect surgeons’ assessments of anatomical overlay accuracy and clinical usefulness during RAMIE. We conducted a survey involving 26 upper gastrointestinal surgeons, including both trainee and attending surgeons, who assessed video clips of RAMIE procedures featuring deep learning-generated anatomical overlays. We correlated the surgeons’ qualitative evaluations of annotation accuracy and clinical usefulness with a comprehensive set of quantitative metrics, including overlap,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Artificial Intelligence in Healthcare and Education · Advanced X-ray and CT Imaging
