SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing
Qi Qian, Haiyang Xu, Ming Yan, Juhua Hu

TL;DR
SimInversion introduces a novel approach to text-guided image editing by optimizing DDIM inversion, reducing approximation errors, and improving editing accuracy without increasing computational costs.
Contribution
It proposes disentangling guidance scales for source and target images in DDIM inversion, enhancing editing performance while maintaining efficiency.
Findings
Significant improvement in image editing quality on PIE-Bench
Theoretically derived optimal guidance scale of 0.5
Enhanced DDIM inversion accuracy without added computational cost
Abstract
Diffusion models demonstrate impressive image generation performance with text guidance. Inspired by the learning process of diffusion, existing images can be edited according to text by DDIM inversion. However, the vanilla DDIM inversion is not optimized for classifier-free guidance and the accumulated error will result in the undesired performance. While many algorithms are developed to improve the framework of DDIM inversion for editing, in this work, we investigate the approximation error in DDIM inversion and propose to disentangle the guidance scale for the source and target branches to reduce the error while keeping the original framework. Moreover, a better guidance scale (i.e., 0.5) than default settings can be derived theoretically. Experiments on PIE-Bench show that our proposal can improve the performance of DDIM inversion dramatically without sacrificing efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Mathematics, Computing, and Information Processing
