# Morphology-Aware Deep Features and Frozen Filters for Surgical Instrument Segmentation with LLM-Based Scene Summarization

**Authors:** Adnan Haider, Muhammad Arsalan, Kyungeun Cho

PMC · DOI: 10.3390/jcm15062227 · Journal of Clinical Medicine · 2026-03-15

## TL;DR

This paper introduces a new AI system for identifying surgical instruments during operations, even under challenging conditions like blood or smoke, and uses a language model to summarize the scene.

## Contribution

The novel FFMS-Net architecture combines frozen edge filters with learnable features to improve surgical instrument segmentation under adverse conditions.

## Key findings

- FFMS-Net outperforms state-of-the-art methods on three surgical datasets with only 1.5 million trainable parameters.
- The model's frozen filters and tri-atrous blending block enhance robustness to blur, blood occlusion, and smoke.
- An open-source language model is used to summarize surgical scenes based on predicted instrument masks.

## Abstract

Background/Objectives: The rise of artificial intelligence is injecting intelligence into the healthcare sector, including surgery. Vision-based intelligent systems that assist surgical procedures can significantly increase productivity, safety, and effectiveness during surgery. Surgical instruments are central components of any surgical intervention, yet detecting and locating them during live surgeries remains challenging due to adverse imaging conditions such as blood occlusion, smoke, blur, glare, low-contrast, instrument scale variation, and other artifacts. Methods: To address these challenges, we developed an advanced segmentation architecture termed the frozen-filters-based morphology-aware segmentation network (FFMS-Net). Accurate surgical instrument segmentation strongly depends on edge and morphology information; however, in conventional neural networks, this spatial information is progressively degraded during spatial processing. FFMS-Net introduces a frozen and learnable feature pipeline (FLFP) that simultaneously exploits frozen edge representations and learnable features. Within FLFP, Sobel and Laplacian filters are frozen to preserve edge and orientation information, which is subsequently fused with learnable initial spatial features. Moreover, a tri-atrous blending (TAB) block is employed at the end of the encoder to fuse multi-receptive-field-based contextual information, preserving instrument morphology and improving robustness under challenging conditions such as blur, blood occlusion, and smoke. Datasets focused on surgical instruments often suffer from severe class imbalance and poor instrument visibility. To mitigate these issues, FFMS-Net incorporates a progressively structure-preserving decoder (PSPD) that aggregates dilated and standard spatial information after each upsampling stage to maintain class structure. Multi-scale spatial features from different encoder levels are further fused using light skip paths (LSPs) to project channels with task-relevant patterns. Results/Conclusions: FFMS-Net is extensively evaluated on three challenging datasets: UW-Sinus-surgery-live, UW-Sinus-cadaveric, and CholecSeg8k. The proposed method demonstrates promising performance compared with state-of-the-art approaches while requiring only 1.5 million trainable parameters. In addition, an open-source large language model is integrated for non-clinical summarization of the surgical scene based on the predicted mask and deterministic descriptors derived from it.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13026850/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13026850/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC13026850/full.md

---
Source: https://tomesphere.com/paper/PMC13026850