Slot-BERT: Self-supervised Object Discovery in Surgical Video
Guiqiu Liao, Matjaz Jogan, Marcel Hussing, Kenta Nakahashi, Kazuhiro Yasufuku, Amin Madani, Eric Eaton, Daniel A. Hashimoto

TL;DR
Slot-BERT is a novel bidirectional model that learns object-centric representations in surgical videos, maintaining long-range temporal coherence efficiently and enabling unsupervised discovery and domain adaptation across diverse surgical procedures.
Contribution
It introduces Slot-BERT, a scalable, bidirectional, self-supervised model with a novel contrastive loss for object discovery in long surgical videos, outperforming existing methods.
Findings
Outperforms state-of-the-art object-centric methods in surgical video analysis.
Achieves robust zero-shot domain adaptation across different surgical datasets.
Scales effectively to long, unconstrained surgical videos.
Abstract
Object-centric slot attention is a powerful framework for unsupervised learning of structured and explainable representations that can support reasoning about objects and actions, including in surgical videos. While conventional object-centric methods for videos leverage recurrent processing to achieve efficiency, they often struggle with maintaining long-range temporal coherence required for long videos in surgical applications. On the other hand, fully parallel processing of entire videos enhances temporal consistency but introduces significant computational overhead, making it impractical for implementation on hardware in medical facilities. We present Slot-BERT, a bidirectional long-range model that learns object-centric representations in a latent space while ensuring robust temporal coherence. Slot-BERT scales object discovery seamlessly to long videos of unconstrained lengths. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Colorectal Cancer Screening and Detection
MethodsSoftmax · Attention Is All You Need
