Multi-frame Feature Aggregation for Real-time Instrument Segmentation in Endoscopic Video
Shan Lin, Fangbo Qin, Haonan Peng, Randall A. Bly, Kris S. Moe, Blake, Hannaford

TL;DR
This paper introduces a lightweight multi-frame feature aggregation method for real-time surgical instrument segmentation, reducing computation costs and improving accuracy in challenging surgical video conditions.
Contribution
The paper proposes a novel MFFA module that aggregates features temporally and spatially, enabling efficient, real-time segmentation with less computational load.
Findings
Outperforms deeper models on public datasets
Reduces computation costs with lightweight encoder
Effective in challenging lighting and blood conditions
Abstract
Deep learning-based methods have achieved promising results on surgical instrument segmentation. However, the high computation cost may limit the application of deep models to time-sensitive tasks such as online surgical video analysis for robotic-assisted surgery. Moreover, current methods may still suffer from challenging conditions in surgical images such as various lighting conditions and the presence of blood. We propose a novel Multi-frame Feature Aggregation (MFFA) module to aggregate video frame features temporally and spatially in a recurrent mode. By distributing the computation load of deep feature extraction over sequential frames, we can use a lightweight encoder to reduce the computation costs at each time step. Moreover, public surgical videos usually are not labeled frame by frame, so we develop a method that can randomly synthesize a surgical frame sequence from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
