A generalizable foundation model for intraoperative understanding across surgical procedures
Kanggil Park, Yongjun Jeon, Soyoung Lim, Seonmin Park, Jongmin Shin, Jung Yong Kim, Sehyeon An, Jinsoo Rhu, Jongman Kim, Gyu-Seong Choi, Namkee Oh, and Kyu-Hwan Jung

TL;DR
This paper introduces ZEN, a large, diverse foundation model trained on over 4 million surgical video frames, which generalizes across procedures and enhances intraoperative understanding for AI-assisted surgical decision-making.
Contribution
The authors present ZEN, a novel self-supervised foundation model trained on extensive surgical data, demonstrating superior generalization across procedures and tasks compared to existing models.
Findings
ZEN outperforms existing models on multiple tasks
ZEN shows robust cross-procedure generalization
ZEN improves intraoperative scene understanding
Abstract
In minimally invasive surgery, clinical decisions depend on real-time visual interpretation, yet intraoperative perception varies substantially across surgeons and procedures. This variability limits consistent assessment, training, and the development of reliable artificial intelligence systems, as most surgical AI models are designed for narrowly defined tasks and do not generalize across procedures or institutions. Here we introduce ZEN, a generalizable foundation model for intraoperative surgical video understanding trained on more than 4 million frames from over 21 procedures using a self-supervised multi-teacher distillation framework. We curated a large and diverse dataset and systematically evaluated multiple representation learning strategies within a unified benchmark. Across 20 downstream tasks and full fine-tuning, frozen-backbone, few-shot and zero-shot settings, ZEN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
