FastFLUX: Pruning FLUX with Block-wise Replacement and Sandwich Training
Fuhan Cai, Yong Guo, Jie Li, Wenbo Li, Jian Chen, Xiangzhong Fang

TL;DR
FastFLUX introduces an architecture-level pruning method with block-wise replacement and sandwich training to significantly improve the inference speed of FLUX models while maintaining high image quality.
Contribution
It proposes a novel pruning framework combining block-wise replacement with linear layers and localized fine-tuning via sandwich training, reducing model size and inference time.
Findings
Maintains high image quality after 20% hierarchy pruning.
Significantly improves inference speed with minimal performance loss.
Effective pruning method applicable to diffusion transformer models.
Abstract
Recent advancements in text-to-image (T2I) generation have led to the emergence of highly expressive models such as diffusion transformers (DiTs), exemplified by FLUX. However, their massive parameter sizes lead to slow inference, high memory usage, and poor deployability. Existing acceleration methods (e.g., single-step distillation and attention pruning) often suffer from significant performance degradation and incur substantial training costs. To address these limitations, we propose FastFLUX, an architecture-level pruning framework designed to enhance the inference efficiency of FLUX. At its core is the Block-wise Replacement with Linear Layers (BRLL) method, which replaces structurally complex residual branches in ResBlocks with lightweight linear layers while preserving the original shortcut connections for stability. Furthermore, we introduce Sandwich Training (ST), a localized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
MethodsSoftmax · Attention Is All You Need · Diffusion · Pruning
