xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive   Parallelism

Jiarui Fang; Jinzhe Pan; Xibo Sun; Aoyu Li; Jiannan Wang

arXiv:2411.01738·cs.DC·November 5, 2024

xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Jiarui Fang, Jinzhe Pan, Xibo Sun, Aoyu Li, Jiannan Wang

PDF

Open Access 1 Repo

TL;DR

xDiT is a scalable, hybrid parallel inference engine for Diffusion Transformers, enabling real-time high-quality image and video generation across diverse hardware configurations.

Contribution

The paper introduces xDiT, a novel parallel inference engine combining Sequence Parallel, PipeFusion, and CFG parallel for scalable DiT deployment.

Findings

01

xDiT achieves high scalability on Ethernet-connected GPU clusters.

02

Demonstrates efficient inference on multiple state-of-the-art DiTs.

03

First to showcase DiT scalability in Ethernet GPU clusters.

Abstract

Diffusion models are pivotal for generating high-quality images and videos. Inspired by the success of OpenAI's Sora, the backbone of diffusion models is evolving from U-Net to Transformer, known as Diffusion Transformers (DiTs). However, generating high-quality content necessitates longer sequence lengths, exponentially increasing the computation required for the attention mechanism, and escalating DiTs inference latency. Parallel inference is essential for real-time DiTs deployments, but relying on a single parallel method is impractical due to poor scalability at large scales. This paper introduces xDiT, a comprehensive parallel inference engine for DiTs. After thoroughly investigating existing DiTs parallel approaches, xDiT chooses Sequence Parallel (SP) and PipeFusion, a novel Patch-level Pipeline Parallel method, as intra-image parallel strategies, alongside CFG parallel for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xdit-project/xdit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Neural Networks and Applications

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Dropout · Absolute Position Encodings