Text2LIVE: Text-Driven Layered Image and Video Editing

Omer Bar-Tal; Dolev Ofri-Amar; Rafail Fridman; Yoni Kasten; Tali Dekel

arXiv:2204.02491·cs.CV·May 26, 2022·27 cites

Text2LIVE: Text-Driven Layered Image and Video Editing

Omer Bar-Tal, Dolev Ofri-Amar, Rafail Fridman, Yoni Kasten, Tali Dekel

PDF

Open Access 1 Repo 1 Models

TL;DR

Text2LIVE enables zero-shot, text-driven, layered editing of images and videos by generating an overlay layer that semantically modifies appearance or adds effects, maintaining high fidelity without pre-trained generators.

Contribution

The paper introduces a novel text-driven layered editing approach that does not rely on pre-trained generators or user masks, enabling high-resolution, semantic edits in images and videos.

Findings

01

Effective zero-shot editing with high fidelity.

02

Capable of localized, semantic modifications across diverse scenes.

03

Operates without pre-trained generators or manual masks.

Abstract

We present a method for zero-shot, text-driven appearance manipulation in natural images and videos. Given an input image or video and a target text prompt, our goal is to edit the appearance of existing objects (e.g., object's texture) or augment the scene with visual effects (e.g., smoke, fire) in a semantically meaningful manner. We train a generator using an internal dataset of training examples, extracted from a single input (image or video and target text prompt), while leveraging an external pre-trained CLIP model to establish our losses. Rather than directly generating the edited output, our key idea is to generate an edit layer (color+opacity) that is composited over the original input. This allows us to constrain the generation process and maintain high fidelity to the original input via novel text-driven losses that are applied directly to the edit layer. Our method neither…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

omerbt/Text2LIVE
pytorchOfficial

Models

🤗
Antiraedus/test-text2Live
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization

MethodsContrastive Language-Image Pre-training