DiffNAS: Bootstrapping Diffusion Models by Prompting for Better Architectures
Wenhao Li, Xiu Su, Shan You, Fei Wang, Chen Qian, Chang Xu

TL;DR
This paper introduces DiffNAS, a novel approach for optimizing diffusion model architectures using GPT-4 for rapid search and RFID for quick ranking, resulting in improved efficiency and performance on CIFAR10.
Contribution
DiffNAS is the first to leverage GPT-4 as a supernet for diffusion model architecture search, significantly enhancing search speed and model performance.
Findings
Search efficiency doubled with GPT-4 based search.
Achieved 2.82 FID on CIFAR10, outperforming IDDPM.
Rapid convergence training improved overall process.
Abstract
Diffusion models have recently exhibited remarkable performance on synthetic data. After a diffusion path is selected, a base model, such as UNet, operates as a denoising autoencoder, primarily predicting noises that need to be eliminated step by step. Consequently, it is crucial to employ a model that aligns with the expected budgets to facilitate superior synthetic performance. In this paper, we meticulously analyze the diffusion model and engineer a base model search approach, denoted "DiffNAS". Specifically, we leverage GPT-4 as a supernet to expedite the search, supplemented with a search memory to enhance the results. Moreover, we employ RFID as a proxy to promptly rank the experimental outcomes produced by GPT-4. We also adopt a rapid-convergence training strategy to boost search efficiency. Rigorous experimentation corroborates that our algorithm can augment the search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Model Reduction and Neural Networks
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Balanced Selection · Dense Connections · Label Smoothing · Adam · Diffusion
