Future Slot Prediction for Unsupervised Object Discovery in Surgical Video

Guiqiu Liao; Matjaz Jogan; Marcel Hussing; Edward Zhang; Eric Eaton; Daniel A. Hashimoto

arXiv:2507.01882·cs.CV·July 9, 2025

Future Slot Prediction for Unsupervised Object Discovery in Surgical Video

Guiqiu Liao, Matjaz Jogan, Marcel Hussing, Edward Zhang, Eric Eaton, Daniel A. Hashimoto

PDF

Open Access

TL;DR

This paper introduces a dynamic temporal slot transformer that improves unsupervised object discovery in surgical videos, enabling better interpretation of complex scenes for healthcare applications.

Contribution

It proposes a novel DTST module that enhances slot prediction over time, addressing challenges in parsing heterogeneous surgical scenes.

Findings

01

Achieves state-of-the-art results on surgical datasets

02

Effective in real-time surgical video interpretation

03

Improves unsupervised object discovery in complex scenes

Abstract

Object-centric slot attention is an emerging paradigm for unsupervised learning of structured, interpretable object-centric representations (slots). This enables effective reasoning about objects and events at a low computational cost and is thus applicable to critical healthcare applications, such as real-time interpretation of surgical video. The heterogeneous scenes in real-world applications like surgery are, however, difficult to parse into a meaningful set of slots. Current approaches with an adaptive slot count perform well on images, but their performance on surgical videos is low. To address this challenge, we propose a dynamic temporal slot transformer (DTST) module that is trained both for temporal reasoning and for predicting the optimal future slot initialization. The model achieves state-of-the-art performance on multiple surgical databases, demonstrating that unsupervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning