Text-Driven Stylization of Video Objects

Sebastian Loeschcke; Serge Belongie; Sagie Benaim

arXiv:2206.12396·cs.CV·June 28, 2022

Text-Driven Stylization of Video Objects

Sebastian Loeschcke, Serge Belongie, Sagie Benaim

PDF

Open Access

TL;DR

This paper introduces a method for semantically stylizing video objects based on user text prompts, ensuring temporal consistency and preservation of details by leveraging CLIP and an atlas decomposition network.

Contribution

The novel approach combines global and local text prompts with CLIP similarity and an atlas network to achieve temporally consistent, detailed, and user-guided video object stylization.

Findings

01

Produces consistent style changes over time

02

Adheres to user-specified text prompts

03

Allows varying levels of stylization detail

Abstract

We tackle the task of stylizing video objects in an intuitive and semantic manner following a user-specified text prompt. This is a challenging task as the resulting video must satisfy multiple properties: (1) it has to be temporally consistent and avoid jittering or similar artifacts, (2) the resulting stylization must preserve both the global semantics of the object and its fine-grained details, and (3) it must adhere to the user-specified text prompt. To this end, our method stylizes an object in a video according to two target texts. The first target text prompt describes the global semantics and the second target text prompt describes the local semantics. To modify the style of an object, we harness the representational power of CLIP to get a similarity score between (1) the local target text and a set of local stylized views, and (2) a global target text and a set of stylized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Digital Humanities and Scholarship · Handwritten Text Recognition Techniques

MethodsContrastive Language-Image Pre-training