# Surgical Instrument Segmentation via Segment-Then-Classify Framework with Instance-Level Spatiotemporal Consistency Modeling

**Authors:** Tiyao Zhang, Xue Yuan, Hongze Xu

PMC · DOI: 10.3390/jimaging11100364 · Journal of Imaging · 2025-10-15

## TL;DR

This paper introduces a new framework for accurately segmenting surgical instruments in endoscopic videos by improving spatial and temporal consistency.

## Contribution

The Segment-Then-Classify framework decouples mask generation from classification and introduces instance-level spatiotemporal modeling for better performance.

## Key findings

- The framework achieves mIoU improvements of 3.06%, 2.99%, and 1.67% on EndoVis datasets.
- It shows mcIoU gains of 2.36%, 2.85%, and 6.06% over state-of-the-art methods.
- The method maintains computational efficiency while improving robustness to occlusion and motion blur.

## Abstract

Accurate segmentation of surgical instruments in endoscopic videos is crucial for robot-assisted surgery and intraoperative analysis. This paper presents a Segment-then-Classify framework that decouples mask generation from semantic classification to enhance spatial completeness and temporal stability. First, a Mask2Former-based segmentation backbone generates class-agnostic instance masks and region features. Then, a bounding box-guided instance-level spatiotemporal modeling module fuses geometric priors and temporal consistency through a lightweight transformer encoder. This design improves interpretability and robustness under occlusion and motion blur. Experiments on the EndoVis 2017 and 2018 datasets demonstrate that our framework achieves mIoU improvements of 3.06%, 2.99%, and 1.67% and mcIoU gains of 2.36%, 2.85%, and 6.06%, respectively, over previously state-of-the-art methods, while maintaining computational efficiency.

## Full-text entities

- **Diseases:** injury to (MESH:D014947)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12565326/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12565326/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/PMC12565326/full.md

---
Source: https://tomesphere.com/paper/PMC12565326