BudgetFusion: Perceptually-Guided Adaptive Diffusion Models
Qinchan Li, Kenneth Chen, Changyue Su, Qi Sun

TL;DR
BudgetFusion adaptively determines the optimal number of diffusion steps for text-to-image generation, reducing computational effort while maintaining perceptual quality, thus addressing efficiency and societal concerns.
Contribution
It introduces a perceptually-guided adaptive diffusion model that predicts the necessary diffusion steps based on human perceptual metrics, improving efficiency over fixed-step approaches.
Findings
Saves up to five seconds per prompt without quality loss
Reduces energy consumption in diffusion-based image generation
Validated through numerical analysis and user studies
Abstract
Diffusion models have shown unprecedented success in the task of text-to-image generation. While these models are capable of generating high-quality and realistic images, the complexity of sequential denoising has raised societal concerns regarding high computational demands and energy consumption. In response, various efforts have been made to improve inference efficiency. However, most of the existing efforts have taken a fixed approach with neural network simplification or text prompt optimization. Are the quality improvements from all denoising computations equally perceivable to humans? We observed that images from different text prompts may require different computational efforts given the desired content. The observation motivates us to present BudgetFusion, a novel model that suggests the most perceptually efficient number of diffusion steps before a diffusion model starts to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManagement, Economics, and Public Policy · Stock Market Forecasting Methods
MethodsDiffusion
