Dynamic VLM-Guided Negative Prompting for Diffusion Models
Hoyeon Chang, Seungjin Kim, Yoonseok Choi

TL;DR
This paper introduces a dynamic negative prompting technique for diffusion models that uses vision-language models to adaptively generate negative prompts during denoising, improving image quality and alignment.
Contribution
It presents a novel method that dynamically generates negative prompts using VLMs during diffusion, unlike fixed prompt approaches, enhancing flexibility and performance.
Findings
Improved image quality and alignment in benchmark tests
Effective trade-off management between guidance strength and accuracy
Demonstrated adaptability across multiple datasets
Abstract
We propose a novel approach for dynamic negative prompting in diffusion models that leverages Vision-Language Models (VLMs) to adaptively generate negative prompts during the denoising process. Unlike traditional Negative Prompting methods that use fixed negative prompts, our method generates intermediate image predictions at specific denoising steps and queries a VLM to produce contextually appropriate negative prompts. We evaluate our approach on various benchmark datasets and demonstrate the trade-offs between negative guidance strength and text-image alignment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
