TurboEdit: Instant text-based image editing
Zongze Wu, Nicholas Kolkin, Jonathan Brandt, Richard Zhang, Eli, Shechtman

TL;DR
TurboEdit introduces a fast, real-time text-guided image editing method using few-step diffusion models, enabling precise inversion and disentangled attribute control with minimal computational cost.
Contribution
It presents an encoder-based iterative inversion technique and a simple, effective way to achieve disentangled, text-guided image editing in diffusion models with minimal evaluations.
Findings
Real-time editing with only 8 NFEs for inversion and 4 NFEs per edit.
Outperforms state-of-the-art multi-step diffusion editing methods.
Enables precise, attribute-specific image manipulation using text prompts.
Abstract
We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disentangled controls can be easily achieved in the few-step diffusion model by conditioning on an (automatically generated) detailed text prompt. To manipulate the inverted image, we freeze the noise maps and modify one attribute in the text prompt (either manually or via instruction based editing driven by an LLM), resulting in the generation of a new image similar to the input image with only one attribute changed. It can further control the editing strength and accept instructive text prompt. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsDiffusion
