PromptFix: You Prompt and We Fix the Photo
Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo

TL;DR
PromptFix is a comprehensive framework that enhances diffusion models' ability to follow human instructions for diverse image-processing tasks, addressing data scarcity and detail preservation issues.
Contribution
It introduces a large-scale instruction-following dataset, a high-frequency guidance sampling method, and an auxiliary prompting adapter using VLMs to improve task generalization.
Findings
Outperforms previous methods in various image-processing tasks.
Achieves comparable inference efficiency with baseline models.
Exhibits superior zero-shot capabilities in blind restoration and combination tasks.
Abstract
Diffusion models equipped with language models demonstrate excellent controllability in image generation tasks, allowing image processing to adhere to human instructions. However, the lack of diverse instruction-following data hampers the development of models that effectively recognize and execute user-customized instructions, particularly in low-level tasks. Moreover, the stochastic nature of the diffusion process leads to deficiencies in image generation or editing tasks that require the detailed preservation of the generated images. To address these limitations, we propose PromptFix, a comprehensive framework that enables diffusion models to follow human instructions to perform a wide variety of image-processing tasks. First, we construct a large-scale instruction-following dataset that covers comprehensive image-processing tasks, including low-level tasks, image editing, and object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
