Dual-Solver: A Generalized ODE Solver for Diffusion Models with Dual Prediction
Soochul Park, Yeon Ju Lee

TL;DR
Dual-Solver introduces a learnable, flexible ODE sampling method for diffusion models that reduces inference costs and improves image generation quality with fewer function evaluations.
Contribution
It generalizes multistep samplers with learnable parameters to interpolate prediction types, select integration domains, and adjust residuals, maintaining second-order accuracy.
Findings
Improves FID scores in low-NFE regimes for ImageNet generation.
Enhances CLIP scores for text-to-image synthesis.
Reduces sampling cost while maintaining high image quality.
Abstract
Diffusion models achieve state-of-the-art image quality. However, sampling is costly at inference time because it requires a large number of function evaluations (NFEs). To reduce NFEs, classical ODE numerical methods have been adopted. Yet, the choice of prediction type and integration domain leads to different sampling behaviors. To address these issues, we introduce Dual-Solver, which generalizes multistep samplers through learnable parameters that continuously (i) interpolate among prediction types, (ii) select the integration domain, and (iii) adjust the residual terms. It retains the standard predictor-corrector structure while preserving second-order local accuracy. These parameters are learned via a classification-based objective using a frozen pretrained classifier (e.g., MobileNet or CLIP). For ImageNet class-conditional generation (DiT, GM-DiT) and text-to-image generation…
Peer Reviews
Decision·ICLR 2026 Poster
1. Unified, principled parameterization. γ cleanly bridges noise/data/velocity predictions; τ smoothly interpolates λ↔ρ domains; κ adjusts second-order residuals 2. Strong low-NFE results across backbones. Consistent FID/CLIP gains at ≤6 steps on DiT, SANA, PixArt-α, and GM-DiT. 3. Classification-based training removes the need for high-NFE teacher samples, reducing preparation cost and directly optimizing a perceptual proxy.
1. Training introduces ten learnable parameters per step (in addition to a learned schedule), which could increase brittleness or overfitting to specific backbones and guidance settings. Also, does the method require separate training runs for different NFE targets? 2. Because CE/CLIP is computed with a frozen classifier, the optimization might bias samples toward ‘classifier-friendly’ patterns. How do the authors mitigate proxy-objective leakage and ensure alignment with the intended generativ
1. It constructs a generalized ODE solver based on a "predictor-corrector" structure. Through explicit interpolation of $\gamma$ and domain transformation via $\tau$, it unifies noise/data/velocity prediction methods within a single framework. 2. It proposes a solver parameter update method based on classification loss or CLIP loss, abandoning the parameter update learning strategy that relies on teacher trajectories. This reduces computational overhead and improves performance under low NFE. 3.
1. The method is derived based on second-order accuracy, but it does not discuss potential issues when extending to higher-order schemes, which limits the further performance improvement of Dual-Solver. 2. It lacks comparative results with current advanced learning-based solvers, such as EPD-Solver[1] and AMED-Solver[2]. 3. The parameter update of Dual-Solver relies on the cross-entropy loss of a classifier or CLIP loss, which means the parameter update of Dual-Solver will fail for unguided cond
1. The proposed method is novel and solid theoretical. This work analyzes currently popular semi-linear samplers and different prediction targets and proposes a more general PC-based sampler. Contributions include: - unifies different prediction according to a dual-prediction parameterization. - uses a learned change of variables in semi-linear family for integration - second order residual adjustment for p1c2 2. Experimental sufficient. This work provides strong, multi-backbone r
1. Personally, the article writing is somewhat difficult to follow up on. The overall structure is somewhat confusing to me. 2. The description of the parameter training part is insufficient. Especially the training process. 3. The discussion of details for certain algorithms lacks intuitive understanding.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks
