A generalizable foundation model for intraoperative understanding across surgical procedures

Kanggil Park; Yongjun Jeon; Soyoung Lim; Seonmin Park; Jongmin Shin; Jung Yong Kim; Sehyeon An; Jinsoo Rhu; Jongman Kim; Gyu-Seong Choi; Namkee Oh; and Kyu-Hwan Jung

arXiv:2602.13633·cs.CV·February 17, 2026

A generalizable foundation model for intraoperative understanding across surgical procedures

Kanggil Park, Yongjun Jeon, Soyoung Lim, Seonmin Park, Jongmin Shin, Jung Yong Kim, Sehyeon An, Jinsoo Rhu, Jongman Kim, Gyu-Seong Choi, Namkee Oh, and Kyu-Hwan Jung

PDF

Open Access

TL;DR

This paper introduces ZEN, a large, diverse foundation model trained on over 4 million surgical video frames, which generalizes across procedures and enhances intraoperative understanding for AI-assisted surgical decision-making.

Contribution

The authors present ZEN, a novel self-supervised foundation model trained on extensive surgical data, demonstrating superior generalization across procedures and tasks compared to existing models.

Findings

01

ZEN outperforms existing models on multiple tasks

02

ZEN shows robust cross-procedure generalization

03

ZEN improves intraoperative scene understanding

Abstract

In minimally invasive surgery, clinical decisions depend on real-time visual interpretation, yet intraoperative perception varies substantially across surgeons and procedures. This variability limits consistent assessment, training, and the development of reliable artificial intelligence systems, as most surgical AI models are designed for narrowly defined tasks and do not generalize across procedures or institutions. Here we introduce ZEN, a generalizable foundation model for intraoperative surgical video understanding trained on more than 4 million frames from over 21 procedures using a self-supervised multi-teacher distillation framework. We curated a large and diverse dataset and systematically evaluated multiple representation learning strategies within a unified benchmark. Across 20 downstream tasks and full fine-tuning, frozen-backbone, few-shot and zero-shot settings, ZEN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSurgical Simulation and Training · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning