Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge

Tobias Rueckert; David Rauber; Raphaela Maerkl; Leonard Klausmann; Suemeyye R. Yildiran; Max Gutbrod; Danilo Weber Nunes; Alvaro Fernandez Moreno; Imanol Luengo; Danail Stoyanov; Nicolas Toussaint; Enki Cho; Hyeon Bae Kim; Oh Sung Choo; Ka Young Kim; Seong Tae Kim; Gon\c{c}alo Arantes; Kehan Song; Jianjun Zhu; Junchen Xiong; Tingyi Lin; Shunsuke Kikuchi; Hiroki Matsuzaki; Atsushi Kouno; Jo\~ao Renato Ribeiro Manesco; Jo\~ao Paulo Papa; Tae-Min Choi; Tae Kyeong Jeong; Juyoun Park; Oluwatosin Alabi; Meng Wei; Tom Vercauteren; Runzhi Wu; Mengya Xu; An Wang; Long Bai; Hongliang Ren; Amine Yamlahi; Jakob Hennighausen; Lena Maier-Hein; Satoshi Kondo; Satoshi Kasai; Kousuke Hirasawa; Shu Yang; Yihui Wang; Hao Chen; Santiago Rodr\'iguez; Nicol\'as Aparicio; Leonardo Manrique; Juan Camilo Lyons; Olivia Hosie; Nicol\'as Ayobi; Pablo Arbel\'aez; Yiping Li; Yasmina Al Khalil; Sahar Nasirihaghighi; Stefanie Speidel; Daniel Rueckert; Hubertus Feussner; Dirk Wilhelm; Christoph Palm

arXiv:2507.16559·cs.CV·February 3, 2026

Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge

Tobias Rueckert, David Rauber, Raphaela Maerkl, Leonard Klausmann, Suemeyye R. Yildiran, Max Gutbrod, Danilo Weber Nunes, Alvaro Fernandez Moreno, Imanol Luengo, Danail Stoyanov, Nicolas Toussaint, Enki Cho, Hyeon Bae Kim, Oh Sung Choo, Ka Young Kim, Seong Tae Kim

PDF

TL;DR

This paper presents a comprehensive benchmark for surgical scene understanding in endoscopy, focusing on phase recognition, instrument keypoint estimation, and segmentation, using a novel multi-center dataset to improve robustness and interpretability.

Contribution

It introduces a new multi-center dataset with unified annotations for three interrelated tasks, enabling joint analysis and temporal context integration in surgical videos.

Findings

01

Benchmark results for the three tasks are provided.

02

The dataset supports temporal and contextual analysis.

03

Results highlight challenges and future directions in surgical scene understanding.

Abstract

Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context - such as the current procedural phase - has emerged as a promising strategy to improve robustness and interpretability. To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.