Free Lunch for Surgical Video Understanding by Distilling Self-Supervisions
Xinpeng Ding, Ziwei Liu, Xiaomeng Li

TL;DR
This paper introduces a self-supervised learning framework for surgical video understanding that leverages knowledge distillation from models trained on large datasets, significantly enhancing performance especially with limited data.
Contribution
It proposes a novel knowledge distillation approach from publicly available models to improve self-supervised learning for surgical videos, which was not explored before.
Findings
Significant performance improvements on surgical phase recognition benchmarks.
Enhanced effectiveness in low-data scenarios.
The framework outperforms existing self-supervised methods.
Abstract
Self-supervised learning has witnessed great progress in vision and NLP; recently, it also attracted much attention to various medical imaging modalities such as X-ray, CT, and MRI. Existing methods mostly focus on building new pretext self-supervision tasks such as reconstruction, orientation, and masking identification according to the properties of medical images. However, the publicly available self-supervision models are not fully exploited. In this paper, we present a powerful yet efficient self-supervision framework for surgical video understanding. Our key insight is to distill knowledge from publicly available models trained on large generic datasets4 to facilitate the self-supervised learning of surgical videos. To this end, we first introduce a semantic-preserving training scheme to obtain our teacher model, which not only contains semantics from the publicly available…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · AI in cancer detection · Domain Adaptation and Few-Shot Learning
