XiCAD: Camera Activation Detection in the Da Vinci Xi User Interface
Alexander C. Jenke, Gregor Just, Claas de Boer, Martin Wagner, Sebastian Bodenstedt, Stefanie Speidel

TL;DR
This paper presents XiCAD, a deep learning-based system that accurately detects camera activation in DaVinci Xi surgical videos, enabling automated metadata extraction for surgical data analysis.
Contribution
We developed a lightweight ResNet18-based pipeline that reliably detects camera activation and localization in surgical videos, improving automation in surgical data science tasks.
Findings
Achieved F1-scores between 0.993 and 1.000 for camera activation detection.
Successfully localized camera tiles without false multiple detections.
Demonstrated real-time performance on over 70,000 frames.
Abstract
Purpose: Robot-assisted minimally invasive surgery relies on endoscopic video as the sole intraoperative visual feedback. The DaVinci Xi system overlays a graphical user interface (UI) that indicates the state of each robotic arm, including the activation of the endoscope arm. Detecting this activation provides valuable metadata such as camera movement information, which can support downstream surgical data science tasks including tool tracking, skill assessment, or camera control automation. Methods: We developed a lightweight pipeline based on a ResNet18 convolutional neural network to automatically identify the position of the camera tile and its activation state within the DaVinci Xi UI. The model was fine-tuned on manually annotated data from the SurgToolLoc dataset and evaluated across three public datasets comprising over 70,000 frames. Results: The model achieved F1-scores…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Soft Robotics and Applications · Multimodal Machine Learning Applications
