GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial Solver
Aleksandr Oganov, Ilya Bykov, Eva Neudachina, Mishan Aliev, Alexander Tolmachev, Alexander Sidorov, Aleksandr Zuev, Andrey Okhotin, Denis Rakitin, Aibek Alanov

TL;DR
This paper introduces the Generalized Adversarial Solver, a simple and effective method to improve the quality of diffusion model sampling by combining a new ODE parameterization with adversarial training, reducing artifacts and preserving details.
Contribution
The paper proposes a novel, training-trick-free ODE solver parameterization combined with adversarial training, enhancing diffusion sampling quality without complex training procedures.
Findings
Outperforms existing solvers in quality under similar resource constraints
Reduces artifacts and improves detail fidelity in generated samples
Does not require additional training tricks
Abstract
While diffusion models achieve state-of-the-art generation quality, they still suffer from computationally expensive sampling. Recent works address this issue with gradient-based optimization methods that distill a few-step ODE diffusion solver from the full sampling process, reducing the number of function evaluations from dozens to just a few. However, these approaches often rely on intricate training techniques and do not explicitly focus on preserving fine-grained details. In this paper, we introduce the Generalized Solver: a simple parameterization of the ODE sampler that does not require additional training tricks and improves quality over existing approaches. We further combine the original distillation loss with adversarial training, which mitigates artifacts and enhances detail fidelity. We call the resulting method the Generalized Adversarial Solver and demonstrate its…
Peer Reviews
Decision·ICLR 2026 Poster
I think this project has the ingredients to ultimately build a strong paper. - The research direction seems logical to me - training the solver might be more efficient than training the model itself. - Its two components, GS and the GAN loss, each provide an FID boost over past methods, with the GAN loss's boost being particularly impressive in the 4-6 NFE range. - The GAN loss in particular is a very natural import from other subfields of diffusion modelling.
However, to me the paper reads more like a technical report than a top conference paper. A couple of ideas are proposed and results are shown, but not much beyond that in terms of motivation or generalizable insight and analysis. - Very little motivation is given for the way GS is designed. The motivation given (I believe L225 -> 229) is not honestly not very clear. Are you just looking for ways to add more parameters to the solver or is there something more to it? - As someone familiar with
1. **Effective Use of Adversarial Loss** The incorporation of an adversarial loss is both intuitive and impactful, substantially improving distribution alignment between the generated samples and the teacher model. 2. **Clarity and Organization** The paper is clearly written and well-structured, making it easy to follow the methodology and results. 3. **Flexible and Innovative Parameterization** The proposed parameterization, distinct from that of S4S [1], introduces additio
1. **High Training Cost from Adversarial Loss** While the adversarial loss improves overall quality, it significantly increases the training cost. As shown in Table 8, the training time is more than double that of GS (without adversarial loss), raising questions about computational efficiency. 2. **Questionable Initialization Strategy** The solver learns coefficients for a linear multistep method but initializes some parameters from the DPM solver [2], which belongs to the exponenti
- The work improves the performance of the ODE solver distillation paradigm. - Thorough ablations demonstrate the contribution of each component, with the adversarial loss boosting performance at low NFE. - Cross dataset generalization results in the appendix are strong and main text worthy. - The appendix argument for faster training relative to consistency models or progressive distillation strengthens its practical appeal within the solver distillation family.
- The paper should explicitly state whether training is data free. By equations 7 and 22, no dependence on training data is apparent, yet in section 4.3 "Dataset size" is used and it is unclear. It seems to refer to the number of samples generated by the teacher, as parts of the appendix suggest. - Line 475 claims the generalized solver parameterization "significantly accelerates training" relative to existing parameterizations. Section 4.2 compares final performance to S4S but does not show co
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques
