Exploiting the potential of unlabeled endoscopic video data with   self-supervised learning

Tobias Ross; David Zimmerer; Anant Vemuri; Fabian Isensee; Manuel; Wiesenfarth; Sebastian Bodenstedt; Fabian Both; Philip Kessler; Martin; Wagner; Beat M\"uller; Hannes Kenngott; Stefanie Speidel; Annette; Kopp-Schneider; Klaus Maier-Hein; Lena Maier-Hein

arXiv:1711.09726·cs.CV·February 1, 2018

Exploiting the potential of unlabeled endoscopic video data with self-supervised learning

Tobias Ross, David Zimmerer, Anant Vemuri, Fabian Isensee, Manuel, Wiesenfarth, Sebastian Bodenstedt, Fabian Both, Philip Kessler, Martin, Wagner, Beat M\"uller, Hannes Kenngott, Stefanie Speidel, Annette, Kopp-Schneider, Klaus Maier-Hein, Lena Maier-Hein

PDF

TL;DR

This paper demonstrates that self-supervised learning using unlabeled endoscopic videos, specifically through a GAN-based re-colorization auxiliary task, can significantly reduce manual annotation needs while maintaining high segmentation performance.

Contribution

It introduces a novel self-supervised pre-training method for CNNs in medical imaging using GAN-based re-colorization of unlabeled videos, reducing annotation effort.

Findings

01

Reduces labeled data requirement by up to 75%.

02

Outperforms pre-training on external datasets.

03

Effective for instrument segmentation in endoscopy.

Abstract

Surgical data science is a new research field that aims to observe all aspects of the patient treatment process in order to provide the right assistance at the right time. Due to the breakthrough successes of deep learning-based solutions for automatic image annotation, the availability of reference annotations for algorithm training is becoming a major bottleneck in the field. The purpose of this paper was to investigate the concept of self-supervised learning to address this issue. Our approach is guided by the hypothesis that unlabeled video data can be used to learn a representation of the target domain that boosts the performance of state-of-the-art machine learning algorithms when used for pre-training. Core of the method is an auxiliary task based on raw endoscopic video data of the target domain that is used to initialize the convolutional neural network (CNN) for the target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.