TL;DR
This paper introduces diffusion model-based style transfer techniques, CACTI and CACTIF, for improving synthetic-to-real domain adaptation in semantic segmentation, achieving better image quality and domain gap reduction.
Contribution
It proposes novel class-aware diffusion style transfer methods, CACTI and CACTIF, that preserve semantics and structure, enhancing synthetic-to-real domain adaptation.
Findings
Higher quality images with lower FID scores
Better preservation of semantic boundaries
Effective bridging of domain gap with minimal target data
Abstract
Semantic segmentation models trained on synthetic data often perform poorly on real-world images due to domain gaps, particularly in adverse conditions where labeled data is scarce. Yet, recent foundation models enable to generate realistic images without any training. This paper proposes to leverage such diffusion models to improve the performance of vision models when learned on synthetic data. We introduce two novel techniques for semantically consistent style transfer using diffusion models: Class-wise Adaptive Instance Normalization and Cross-Attention (CACTI) and its extension with selective attention Filtering (CACTIF). CACTI applies statistical normalization selectively based on semantic classes, while CACTIF further filters cross-attention maps based on feature similarity, preventing artifacts in regions with weak cross-attention correspondences. Our methods transfer style…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Diffusion · Adaptive Instance Normalization · Instance Normalization
