SlotMatch: Distilling Object-Centric Representations for Unsupervised Video Segmentation

Diana-Nicoleta Grigore; Neelu Madan; Andreas Mogelmose; Thomas B. Moeslund; Radu Tudor Ionescu

arXiv:2508.03411·cs.CV·November 19, 2025

SlotMatch: Distilling Object-Centric Representations for Unsupervised Video Segmentation

Diana-Nicoleta Grigore, Neelu Madan, Andreas Mogelmose, Thomas B. Moeslund, Radu Tudor Ionescu

PDF

TL;DR

SlotMatch is a simple yet effective knowledge distillation framework that transfers object-centric representations from a large teacher model to a lightweight student, achieving superior unsupervised video segmentation performance with fewer parameters and faster inference.

Contribution

This paper introduces SlotMatch, a novel knowledge distillation method that aligns object-centric slots without extra losses, enabling lightweight models to outperform larger teachers in unsupervised video segmentation.

Findings

01

The student model matches and surpasses the teacher's performance.

02

The distilled student uses 3.6x fewer parameters and is 2.7x faster.

03

SlotMatch outperforms existing state-of-the-art models.

Abstract

Unsupervised video segmentation is a challenging computer vision task, especially due to the lack of supervisory signals coupled with the complexity of visual scenes. To overcome this challenge, state-of-the-art models based on slot attention often have to rely on large and computationally expensive neural architectures. To this end, we propose a simple knowledge distillation framework that effectively transfers object-centric representations to a lightweight student. The proposed framework, called SlotMatch, aligns corresponding teacher and student slots via the cosine similarity, requiring no additional distillation objectives or auxiliary supervision. The simplicity of SlotMatch is confirmed via theoretical and empirical evidence, both indicating that integrating additional losses is redundant. We conduct experiments on three datasets to compare the state-of-the-art teacher model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.