# A fine-tuned foundational model SurgiSAM2 for surgical video anatomy segmentation and detection

**Authors:** Devanish N. Kamtam, Joseph B. Shrager, Satya Deepya Malla, Xiaohan Wang, Nicole Lin, Juan J. Cardona, Serena Yeung-Levy, Clarence Hu

PMC · DOI: 10.1038/s41598-025-11759-4 · Scientific Reports · 2025-10-15

## TL;DR

This paper introduces SurgiSAM2, a fine-tuned model for surgical video segmentation that improves accuracy and generalizes well to new organ types.

## Contribution

The novel contribution is the fine-tuning of SAM 2 for surgical anatomy segmentation, achieving state-of-the-art performance with automated annotation potential.

## Key findings

- SurgiSAM2 achieved a 17.9% relative improvement in segmentation performance compared to the baseline SAM 2.
- The model outperformed prior state-of-the-art methods in 80% of tested classes with a weighted mean dice coefficient of 0.91.
- SurgiSAM2 generalized well to unseen organ classes, achieving state-of-the-art results in 77.8% of them.

## Abstract

The foundational segmentation models, segmenting anything model (SAM) and SAM 2, have transformed segmentation by enabling remarkable zero-shot performance across diverse domains. In this study, we evaluate SAM 2 for surgical scene understanding by examining its semantic segmentation capabilities for organs/tissues both in zero-shot scenarios and after fine-tuning. We utilized five public datasets to evaluate and fine-tune SAM 2 for segmenting anatomical tissues in surgical videos/images. Fine-tuning was applied to the image encoder and mask decoder. We limited training subsets from 50 to 400 samples per class to better model real-world constraints with data acquisition. The impact of dataset size on fine-tuning performance was evaluated with weighted mean dice coefficient (WMDC), and the results were also compared against previously reported state-of-the-art (SOTA) results. SurgiSAM 2, a fine-tuned SAM 2 model, demonstrated significant improvements in segmentation performance, achieving a 17.9% relative WMDC gain compared to the baseline SAM 2. Increasing prompt points from 1 to 10 and training data scale from 50/class to 400/class enhanced performance; the best WMDC of 0.92 on the validation subset was achieved with 10 prompt points and 400 samples per class. On the test subset, this model outperformed prior SOTA methods in 24/30 (80%) of the classes with a WMDC of 0.91 using 10-point prompts. Notably, SurgiSAM 2 generalized effectively to unseen organ classes, achieving SOTA on 7/9 (77.8%) of them. Heavily dissected tissues and similar appearing organs such as small and large intestines remained challenging. SAM 2 achieves remarkable zero-shot and fine-tuned performance for surgical scene segmentation, surpassing prior SOTA models across several organ classes of diverse datasets. This suggests immense potential for enabling automated/semi-automated annotation pipelines, thereby decreasing the burden of annotations facilitating several surgical applications.

## Full-text entities

- **Diseases:** renal cyst (MESH:D003560)
- **Chemicals:** SAM (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12528661/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12528661/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12528661/full.md

---
Source: https://tomesphere.com/paper/PMC12528661