Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising

Gongfan Fang; Xinyin Ma; Xinchao Wang

arXiv:2412.05628·cs.CV·December 10, 2024

Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising

Gongfan Fang, Xinyin Ma, Xinchao Wang

PDF

Open Access 1 Repo

TL;DR

Remix-DiT introduces a multi-expert diffusion model that uses learnable mixing of basis models to improve image generation quality efficiently, reducing training costs compared to independent expert models.

Contribution

It proposes a novel mixing approach with basis models and learnable coefficients to enhance diffusion model quality without extensive training of multiple independent experts.

Findings

01

Achieves improved image quality on ImageNet.

02

Maintains efficiency comparable to standard diffusion transformers.

03

Outperforms other multi-expert methods in experiments.

Abstract

Transformer-based diffusion models have achieved significant advancements across a variety of generative tasks. However, producing high-quality outputs typically necessitates large transformer models, which result in substantial training and inference overhead. In this work, we investigate an alternative approach involving multiple experts for denoising, and introduce Remix-DiT, a novel method designed to enhance output quality at a low cost. The goal of Remix-DiT is to craft N diffusion experts for different denoising timesteps, yet without the need for expensive training of N independent models. To achieve this, Remix-DiT employs K basis models (where K < N) and utilizes learnable mixing coefficients to adaptively craft expert models. This design offers two significant advantages: first, although the total model size is increased, the model produced by the mixing operation shares the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vainf/remix-dit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods

MethodsDiffusion