Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised   Semantic Segmentation

Ryota Yoshihashi; Yuya Otsuka; Kenji Doi; Tomohiro Tanaka; Hirokatsu; Kataoka

arXiv:2309.01369·cs.CV·April 16, 2024

Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

Ryota Yoshihashi, Yuya Otsuka, Kenji Doi, Tomohiro Tanaka, Hirokatsu, Kataoka

PDF

Open Access

TL;DR

This paper enhances diffusion-synthetic training for semantic segmentation by introducing techniques that improve mask quality, scalability, and domain transfer, significantly narrowing the gap with real-data training.

Contribution

It proposes three novel techniques—robust training, prompt augmentation, and LoRA-based domain adaptation—to advance diffusion-synthetic semantic segmentation.

Findings

01

Improved segmentation accuracy on PASCAL VOC, ImageNet-S, Cityscapes.

02

Effective domain transfer to auto-driving images.

03

Close performance gap between synthetic and real training data.

Abstract

The advance of generative models for images has inspired various training techniques for image recognition utilizing synthetic images. In semantic segmentation, one promising approach is extracting pseudo-masks from attention maps in text-to-image diffusion models, which enables real-image-and-annotation-free training. However, the pioneering training method using the diffusion-synthetic images and pseudo-masks, i.e., DiffuMask has limitations in terms of mask quality, scalability, and ranges of applicable domains. To overcome these limitations, this work introduces three techniques for diffusion-synthetic semantic segmentation training. First, reliability-aware robust training, originally used in weakly supervised learning, helps segmentation with insufficient synthetic mask quality. %Second, large-scale pretraining of whole segmentation models, not only backbones, on synthetic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsDiffusion