Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse   Mixture-of-Experts

Byeongjun Park; Hyojun Go; Jin-Young Kim; Sangmin Woo; Seokil Ham,; Changick Kim

arXiv:2403.09176·cs.CV·July 11, 2024·1 cites

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

Byeongjun Park, Hyojun Go, Jin-Young Kim, Sangmin Woo, Seokil Ham,, Changick Kim

PDF

Open Access 2 Repos

TL;DR

Switch Diffusion Transformer (Switch-DiT) introduces a novel sparse mixture-of-experts architecture to enhance diffusion models by capturing inter-task relationships and improving image quality and convergence.

Contribution

It proposes a new transformer-based architecture with shared and task-specific paths, and a diffusion prior loss to better model inter-task relationships in diffusion models.

Findings

01

Improves image quality in diffusion tasks

02

Accelerates convergence rate

03

Constructs tailored denoising paths

Abstract

Diffusion models have achieved remarkable success across a range of generative tasks. Recent efforts to enhance diffusion model architectures have reimagined them as a form of multi-task learning, where each task corresponds to a denoising task at a specific noise level. While these efforts have focused on parameter isolation and task routing, they fall short of capturing detailed inter-task relationships and risk losing semantic information, respectively. In response, we introduce Switch Diffusion Transformer (Switch-DiT), which establishes inter-task relationships between conflicting tasks without compromising semantic information. To achieve this, we employ a sparse mixture-of-experts within each transformer block to utilize semantic information and facilitate handling conflicts in tasks through parameter isolation. Additionally, we propose a diffusion prior loss, encouraging similar…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Image and Signal Denoising Methods

MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Layer Normalization · Absolute Position Encodings · Residual Connection · Dropout · Softmax · Linear Layer · Multi-Head Attention