Manipulating Embeddings of Stable Diffusion Prompts
Niklas Deckers, Julia Peters, Martin Potthast

TL;DR
This paper introduces a novel method for directly manipulating prompt embeddings in stable diffusion models, enabling more precise and user-friendly control over generated images through gradient-based techniques and practical interaction tools.
Contribution
It presents a new approach to manipulate prompt embeddings directly, offering three practical tools for improved image control in generative models, surpassing traditional prompt engineering.
Findings
Users found the methods less tedious than prompt engineering.
Generated images were often preferred by users.
The approach allows fine-grained, targeted control over image generation.
Abstract
Prompt engineering is still the primary way for users of generative text-to-image models to manipulate generated images in a targeted way. Based on treating the model as a continuous function and by passing gradients between the image space and the prompt embedding space, we propose and analyze a new method to directly manipulate the embedding of a prompt instead of the prompt text. We then derive three practical interaction tools to support users with image generation: (1) Optimization of a metric defined in the image space that measures, for example, the image style. (2) Supporting a user in creative tasks by allowing them to navigate in the image space along a selection of directions of "near" prompt embeddings. (3) Changing the embedding of the prompt to include information that a user has seen in a particular seed but has difficulty describing in the prompt. Compared to prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsDiffusion
