Shortcutting Pre-trained Flow Matching Diffusion Models is Almost Free Lunch
Xu Cai, Yang Wu, Qianli Chen, Haoran Wu, Lichuan Xiang, Hongkai Wen

TL;DR
This paper introduces a highly efficient post-training method for transforming large pre-trained flow matching diffusion models into few-step samplers using velocity field self-distillation, significantly reducing computational costs.
Contribution
The authors propose a novel velocity field self-distillation technique that enables aggressive shortcutting in flow matching models without retraining, improving efficiency and enabling few-shot distillation.
Findings
Achieved 3-step Flux sampling in less than one A100 day.
Enabled few-shot distillation with as few as 10 text-image pairs.
Produced state-of-the-art performance at minimal cost.
Abstract
We present an ultra-efficient post-training method for shortcutting large-scale pre-trained flow matching diffusion models into efficient few-step samplers, enabled by novel velocity field self-distillation. While shortcutting in flow matching, originally introduced by shortcut models, offers flexible trajectory-skipping capabilities, it requires a specialized step-size embedding incompatible with existing models unless retraining from scratcha process nearly as costly as pretraining itself. Our key contribution is thus imparting a more aggressive shortcut mechanism to standard flow matching models (e.g., Flux), leveraging a unique distillation principle that obviates the need for step-size embedding. Working on the velocity field rather than sample space and learning rapidly from self-guided distillation in an online manner, our approach trains efficiently, e.g.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
