Where It Moves, It Matters: Referring Surgical Instrument Segmentation via Motion
Meng Wei, Kun Yuan, Shi Li, Yue Zhou, Long Bai, Nassir Navab, Hongliang Ren, Hong Joo Lee, Tom Vercauteren, Nicolas Padoy

TL;DR
This paper introduces SurgRef, a motion-guided framework for referring surgical instrument segmentation in videos, leveraging instrument motion rather than appearance to improve robustness and generalization in language-driven surgical scene understanding.
Contribution
The paper presents SurgRef, a novel motion-based approach for referring segmentation that outperforms existing methods and introduces the Ref-IMotion dataset for training and evaluation.
Findings
SurgRef achieves state-of-the-art accuracy in surgical instrument segmentation.
The motion-guided approach improves generalization to unseen instruments and scenarios.
Ref-IMotion dataset enables robust training and evaluation of language-driven surgical segmentation.
Abstract
Enabling intuitive, language-driven interaction with surgical scenes is a critical step toward intelligent operating rooms and autonomous surgical robotic assistance. However, the task of referring segmentation, localizing surgical instruments based on natural language descriptions, remains underexplored in surgical videos, with existing approaches struggling to generalize due to reliance on static visual cues and predefined instrument names. In this work, we introduce SurgRef, a novel motion-guided framework that grounds free-form language expressions in instrument motion, capturing how tools move and interact across time, rather than what they look like. This allows models to understand and segment instruments even under occlusion, ambiguity, or unfamiliar terminology. To train and evaluate SurgRef, we present Ref-IMotion, a diverse, multi-institutional video dataset with dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Surgical Simulation and Training · Soft Robotics and Applications
