Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion   Models

Qian Wang; Abdelrahman Eldesokey; Mohit Mendiratta; Fangneng Zhan,; Adam Kortylewski; Christian Theobalt; Peter Wonka

arXiv:2405.16947·cs.CV·May 28, 2024

Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models

Qian Wang, Abdelrahman Eldesokey, Mohit Mendiratta, Fangneng Zhan,, Adam Kortylewski, Christian Theobalt, Peter Wonka

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel zero-shot video semantic segmentation method leveraging pre-trained diffusion models, incorporating temporal modeling and refinement strategies to achieve high-quality, consistent segmentation without training on video data.

Contribution

The authors introduce a new zero-shot VSS framework based on pre-trained diffusion models, with a scene context model and temporal refinement, outperforming existing methods without training.

Findings

01

Outperforms existing zero-shot image segmentation on VSS benchmarks

02

Rivals supervised VSS approaches on VSPW dataset

03

Achieves high-quality, temporally consistent segmentation without training

Abstract

We introduce the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. A growing research direction attempts to employ diffusion models to perform downstream vision tasks by exploiting their deep understanding of image semantics. Yet, the majority of these approaches have focused on image-related tasks like semantic correspondence and segmentation, with less emphasis on video tasks such as VSS. Ideally, diffusion-based image semantic segmentation approaches can be applied to videos in a frame-by-frame manner. However, we find their performance on videos to be subpar due to the absence of any modeling of temporal information inherent in the video data. To this end, we tackle this problem and introduce a framework tailored for VSS based on pre-trained image and video diffusion models. We propose building a scene context model based on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

QianWangX/VidSeg_diffusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Ideological and Political Education · Multimodal Machine Learning Applications

MethodsDiffusion