GroundedSurg: A Multi-Procedure Benchmark for Language-Conditioned Surgical Tool Segmentation
Tajamul Ashraf, Abrar Ul Riyaz, Wasif Tak, Tavaheed Tariq, Sonia Yadav, Moloud Abdar, and Janibul Bashir

TL;DR
GroundedSurg introduces a novel benchmark for evaluating language-conditioned, instance-level surgical tool segmentation across diverse procedures, emphasizing realistic clinical scenarios and the integration of vision-language reasoning.
Contribution
This work presents the first comprehensive dataset and benchmark for language-conditioned surgical grounding, enabling evaluation of models on instance-level localization with natural language descriptions.
Findings
Significant performance gaps in current models highlight the need for improved vision-language reasoning.
The benchmark covers diverse surgical procedures and instrument types, reflecting real-world complexity.
Evaluation reveals challenges in integrating linguistic and visual understanding in surgical AI.
Abstract
Clinically reliable perception of surgical scenes is essential for advancing intelligent, context-aware intraoperative assistance such as instrument handoff guidance, collision avoidance, and workflow-aware robotic support. Existing surgical tool benchmarks primarily evaluate category-level segmentation, requiring models to detect all instances of predefined instrument classes. However, real-world clinical decisions often require resolving references to a specific instrument instance based on its functional role, spatial relation, or anatomical interaction capabilities not captured by current evaluation paradigms. We introduce GroundedSurg, the first language-conditioned, instance-level surgical grounding benchmark. Each instance pairs a surgical image with a natural-language description targeting a single instrument, accompanied by structured spatial grounding annotations including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Multimodal Machine Learning Applications · Soft Robotics and Applications
