MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
Roy Miles, Mehmet Kerim Yucel, Bruno Manganelli, Albert Saa-Garriga

TL;DR
This paper introduces a lightweight, real-time video object segmentation method for mobile devices using contrastive learning and knowledge distillation, achieving high accuracy with minimal computational resources.
Contribution
It presents a novel framework combining contrastive learning and distillation for efficient video segmentation on resource-limited devices.
Findings
Achieves competitive results on DAVIS and YouTube benchmarks.
Runs at 32 milliseconds per frame on a Samsung Galaxy S22.
Uses 32 times fewer parameters than state-of-the-art models.
Abstract
This paper tackles the problem of semi-supervised video object segmentation on resource-constrained devices, such as mobile phones. We formulate this problem as a distillation task, whereby we demonstrate that small space-time-memory networks with finite memory can achieve competitive results with state of the art, but at a fraction of the computational cost (32 milliseconds per frame on a Samsung Galaxy S22). Specifically, we provide a theoretically grounded framework that unifies knowledge distillation with supervised contrastive representation learning. These models are able to jointly benefit from both pixel-wise contrastive learning and distillation from a pre-trained teacher. We validate this loss by achieving competitive J&F to state of the art on both the standard DAVIS and YouTube benchmarks, despite running up to 5x faster, and with 32x fewer parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Video Surveillance and Tracking Methods
MethodsKnowledge Distillation · Contrastive Learning
