A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
Wouter Van Gansbeke, Bert De Brabandere

TL;DR
This paper introduces a simplified latent diffusion framework for panoptic segmentation and mask inpainting, eliminating complex components of traditional methods and demonstrating strong results on COCO and ADE20k datasets.
Contribution
It presents a novel latent diffusion approach that simplifies panoptic segmentation architecture and enables mask inpainting, with multi-task adaptability.
Findings
Strong segmentation results on COCO and ADE20k
Effective mask inpainting capabilities
Flexible multi-task learning with task embeddings
Abstract
Panoptic and instance segmentation networks are often trained with specialized object detection modules, complex loss functions, and ad-hoc post-processing steps to manage the permutation-invariance of the instance masks. This work builds upon Stable Diffusion and proposes a latent diffusion approach for panoptic segmentation, resulting in a simple architecture that omits these complexities. Our training consists of two steps: (1) training a shallow autoencoder to project the segmentation masks to latent space; (2) training a diffusion model to allow image-conditioned sampling in latent space. This generative approach unlocks the exploration of mask completion or inpainting. The experimental validation on COCO and ADE20k yields strong segmentation results. Finally, we demonstrate our model's adaptability to multi-tasking by introducing learnable task embeddings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsDiffusion
