A one-step generation model with a Single-Layer Transformer: Layer number re-distillation of FreeFlow
Haonan Wei, Linyuan Wang, Nuolin Sun, Zhizhong Zheng, Lei Li, Bin Yan

TL;DR
This paper introduces SLT, a single-layer Transformer model distilled from a 28-layer FreeFlow model, significantly reducing parameters and enabling faster, more stable one-step image generation with improved quality.
Contribution
We propose a novel layer number re-distillation method to compress a 28-layer Transformer into a single-layer model, enhancing efficiency and stability in one-step diffusion model generation.
Findings
SLT reduces parameters from 675M to 4.3M.
SLT enables over 100 noise screenings within the same time as two random samplings.
Generated images show improved quality and stability with SLT.
Abstract
Currently, Flow matching methods aim to compress the iterative generation process of diffusion models into a few or even a single step, with MeanFlow and FreeFlow being representative achievements of one-step generation based on Ordinary Differential Equations (ODEs). We observe that the 28-layer Transformer architecture of FreeFlow can be characterized as an Euler discretization scheme for an ODE along the depth axis, where the layer index serves as the discrete time step. Therefore, we distill the number of layers of the FreeFlow model, following the same derivation logic as FreeFlow, and propose SLT (Single-Layer Transformer), which uses a single shared DiT block to approximate the depth-wise feature evolution of the 28-layer teacher. During training, it matches the teacher's intermediate features at several depth patches, fuses those patch-level representations, and simultaneously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Lattice Boltzmann Simulation Studies
