Dual-Process Image Generation
Grace Luo, Jonathan Granskog, Aleksander Holynski, Trevor Darrell

TL;DR
This paper introduces a dual-process distillation framework that enables feed-forward image generators to learn new tasks from vision-language models, allowing flexible multimodal control over generated images.
Contribution
The proposed method allows image generators to learn new tasks from VLMs using a distillation scheme, expanding control capabilities with a unified text-and-image interface.
Findings
Enables control over image properties like color, depth, and composition.
Allows rapid implementation of multimodal controls.
Demonstrates effectiveness across various control tasks.
Abstract
Prior methods for controlling image generation are limited in their ability to be taught new tasks. In contrast, vision-language models, or VLMs, can learn tasks in-context and produce the correct outputs for a given input. We propose a dual-process distillation scheme that allows feed-forward image generators to learn new tasks from deliberative VLMs. Our scheme uses a VLM to rate the generated images and backpropagates this gradient to update the weights of the image generator. Our general framework enables a wide variety of new control tasks through the same text-and-image based interface. We showcase a handful of applications of this technique for different types of control signals, such as commonsense inferences and visual prompts. With our method, users can implement multimodal controls for properties such as color palette, line weight, horizon position, and relative depth within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques
