TL;DR
APEX introduces a discriminator-free, endogenously derived adversarial correction method for one-step text-to-image synthesis, achieving high quality with improved efficiency and stability.
Contribution
The paper presents APEX, a novel approach that extracts adversarial signals from flow models via condition shifting, enabling stable, efficient, and high-quality one-step image generation.
Findings
APEX outperforms larger models in one-step quality.
Achieves 15.33× inference speedup with LoRA tuning.
Surpasses multi-step teacher models in GenEval score.
Abstract
The push for efficient text to image synthesis has moved the field toward one step sampling, yet existing methods still face a three way tradeoff among fidelity, inference speed, and training efficiency. Approaches that rely on external discriminators can sharpen one step performance, but they often introduce training instability, high GPU memory overhead, and slow convergence, which complicates scaling and parameter efficient tuning. In contrast, regression based distillation and consistency objectives are easier to optimize, but they typically lose fine details when constrained to a single step. We present APEX, built on a key theoretical insight: adversarial correction signals can be extracted endogenously from a flow model through condition shifting. Using a transformation creates a shifted condition branch whose velocity field serves as an independent estimator of the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
