The Curious Case of End Token: A Zero-Shot Disentangled Image Editing using CLIP
Hidir Yesiltepe, Yusuf Dalva, Pinar Yanardag

TL;DR
This paper demonstrates that CLIP can perform zero-shot disentangled image editing within diffusion models, achieving competitive results and offering a lightweight alternative to traditional editing methods.
Contribution
The study reveals CLIP's capability for zero-shot disentangled editing in diffusion models, a novel insight that enhances image editing flexibility without additional training.
Findings
CLIP enables zero-shot disentangled editing in diffusion models.
The proposed method achieves competitive editing quality.
Potential applications include image and video editing.
Abstract
Diffusion models have become prominent in creating high-quality images. However, unlike GAN models celebrated for their ability to edit images in a disentangled manner, diffusion-based text-to-image models struggle to achieve the same level of precise attribute manipulation without compromising image coherence. In this paper, CLIP which is often used in popular text-to-image diffusion models such as Stable Diffusion is capable of performing disentangled editing in a zero-shot manner. Through both qualitative and quantitative comparisons with state-of-the-art editing methods, we show that our approach yields competitive results. This insight may open opportunities for applying this method to various tasks, including image and video editing, providing a lightweight and efficient approach for disentangled editing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection
MethodsContrastive Language-Image Pre-training · Diffusion
