Toward Real-Time Surgical Scene Segmentation via a Spike-Driven Video Transformer with Spike-Informed Pretraining
Shihao Zou, Jingjing Li, Wei Ji, Jincai Huang, Kai Wang, Guo Dan, Weixin Si, Yi Pan

TL;DR
This paper introduces SpikeSurgSeg, a spike-driven video Transformer for surgical scene segmentation that offers real-time, energy-efficient performance with competitive accuracy in data-scarce scenarios, leveraging novel pretraining and knowledge distillation techniques.
Contribution
The paper presents the first spike-driven video Transformer for surgical segmentation, incorporating spike-informed pretraining and multi-spectral knowledge distillation to enhance performance and efficiency.
Findings
Achieves mIoU comparable to state-of-the-art ANN models.
Reduces inference latency by at least 8x.
Delivers over 20x speedup compared to foundation models.
Abstract
Modern surgical systems increasingly rely on intelligent scene understanding to improve intra-operative safety and situational awareness, with surgical scene segmentation playing a fundamental role in fine-grained surgical perception. Although recent ANN models, especially large foundation models, have achieved impressive accuracy, their high computational and energy demands often hinder deployment in resource-constrained operative environments. To address this challenge, we explore SNN as a highly efficient paradigm. However, its performance in surgical scene segmentation remains constrained by sparse spike representations and limited annotated surgical data. We therefore propose SpikeSurgSeg, the first spike-driven video Transformer for surgical scene segmentation. It preserves the real-time and energy-efficient advantages of SNN, while achieving competitive performance against most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurgical Simulation and Training · Advanced Neural Network Applications · Multimodal Machine Learning Applications
