SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

Trong-Tung Nguyen; Quang Nguyen; Khoi Nguyen; Anh Tran; Cuong Pham

arXiv:2412.04301·cs.CV·June 3, 2025

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen, Anh Tran, Cuong Pham

PDF

Open Access

TL;DR

SwiftEdit introduces a highly efficient, instant text-guided image editing method that performs localized edits in 0.23 seconds, significantly faster than previous multi-step diffusion-based approaches, with competitive results.

Contribution

It proposes a novel one-step inversion framework and mask-guided editing with attention rescaling, enabling real-time image editing with diffusion models.

Findings

01

Achieves 50x faster editing than previous methods.

02

Maintains competitive quality in image editing results.

03

Demonstrates effectiveness and efficiency through extensive experiments.

Abstract

Recent advances in text-guided image editing enable users to perform image edits through simple text inputs, leveraging the extensive priors of multi-step diffusion-based text-to-image models. However, these methods often fall short of the speed demands required for real-world and on-device applications due to the costly multi-step inversion and sampling process involved. In response to this, we introduce SwiftEdit, a simple yet highly efficient editing tool that achieve instant text-guided image editing (in 0.23s). The advancement of SwiftEdit lies in its two novel contributions: a one-step inversion framework that enables one-step image reconstruction via inversion and a mask-guided editing technique with our proposed attention rescaling mechanism to perform localized image editing. Extensive experiments are provided to demonstrate the effectiveness and efficiency of SwiftEdit. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Computer Graphics and Visualization Techniques

MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings