FastFLUX: Pruning FLUX with Block-wise Replacement and Sandwich Training

Fuhan Cai; Yong Guo; Jie Li; Wenbo Li; Jian Chen; Xiangzhong Fang

arXiv:2506.10035·cs.GR·January 14, 2026

FastFLUX: Pruning FLUX with Block-wise Replacement and Sandwich Training

Fuhan Cai, Yong Guo, Jie Li, Wenbo Li, Jian Chen, Xiangzhong Fang

PDF

Open Access

TL;DR

FastFLUX introduces an architecture-level pruning method with block-wise replacement and sandwich training to significantly improve the inference speed of FLUX models while maintaining high image quality.

Contribution

It proposes a novel pruning framework combining block-wise replacement with linear layers and localized fine-tuning via sandwich training, reducing model size and inference time.

Findings

01

Maintains high image quality after 20% hierarchy pruning.

02

Significantly improves inference speed with minimal performance loss.

03

Effective pruning method applicable to diffusion transformer models.

Abstract

Recent advancements in text-to-image (T2I) generation have led to the emergence of highly expressive models such as diffusion transformers (DiTs), exemplified by FLUX. However, their massive parameter sizes lead to slow inference, high memory usage, and poor deployability. Existing acceleration methods (e.g., single-step distillation and attention pruning) often suffer from significant performance degradation and incur substantial training costs. To address these limitations, we propose FastFLUX, an architecture-level pruning framework designed to enhance the inference efficiency of FLUX. At its core is the Block-wise Replacement with Linear Layers (BRLL) method, which replaces structurally complex residual branches in ResBlocks with lightweight linear layers while preserving the original shortcut connections for stability. Furthermore, we introduce Sandwich Training (ST), a localized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices

MethodsSoftmax · Attention Is All You Need · Diffusion · Pruning