MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP

Prajwal Ganugula; Y S S S Santosh Kumar; N K Sagar Reddy; Prabhath; Chellingi; Avinash Thakur; Neeraj Kasera; C Shyam Anand

arXiv:2309.13716·cs.CV·September 26, 2023·1 cites

MOSAIC: Multi-Object Segmented Arbitrary Stylization Using CLIP

Prajwal Ganugula, Y S S S Santosh Kumar, N K Sagar Reddy, Prabhath, Chellingi, Avinash Thakur, Neeraj Kasera, C Shyam Anand

PDF

Open Access

TL;DR

MOSAIC introduces a novel text-guided method for object-wise image stylization, enabling fine control over individual objects' styles based on context, surpassing previous methods in quality and flexibility.

Contribution

It is the first approach to achieve text-guided, arbitrary object-wise stylization using vision transformer-based segmentation and stylization modules.

Findings

01

Produces high-quality, visually appealing stylized images.

02

Enhances control over stylization of individual objects.

03

Generalizes well to unseen object classes.

Abstract

Style transfer driven by text prompts paved a new path for creatively stylizing the images without collecting an actual style image. Despite having promising results, with text-driven stylization, the user has no control over the stylization. If a user wants to create an artistic image, the user requires fine control over the stylization of various entities individually in the content image, which is not addressed by the current state-of-the-art approaches. On the other hand, diffusion style transfer methods also suffer from the same issue because the regional stylization control over the stylized output is ineffective. To address this problem, We propose a new method Multi-Object Segmented Arbitrary Stylization Using CLIP (MOSAIC), that can apply styles to different objects in the image based on the context extracted from the input prompt. Text-based segmentation and stylization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Image Processing and 3D Reconstruction

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Residual Connection · Layer Normalization · Dense Connections · Vision Transformer · Contrastive Language-Image Pre-training · Diffusion