WikiStyle+: A Multimodal Approach to Content-Style Representation Disentanglement for Artistic Image Stylization
Ma Zhuoqi, Zhang Yixuan, You Zejun, Tian Long, Liu Xiyang

TL;DR
This paper introduces WikiStyle+, a multimodal dataset and a diffusion model that effectively disentangles content and style for artistic image stylization, supporting multiple modalities and reducing content leakage.
Contribution
It presents a novel multimodal dataset and a diffusion-based approach for improved content-style disentanglement in artistic stylization tasks.
Findings
Achieves thorough content-style disentanglement under multimodal supervision
Enables more refined and style-accurate artistic stylization
Supports multiple input modalities for style and content
Abstract
Artistic image stylization aims to render the content provided by text or image with the target style, where content and style decoupling is the key to achieve satisfactory results. However, current methods for content and style disentanglement primarily rely on image supervision, which leads to two problems: 1) models can only support one modality for style or content input;2) incomplete disentanglement resulting in content leakage from the reference image. To address the above issues, this paper proposes a multimodal approach to content-style disentanglement for artistic image stylization. We construct a \textit{WikiStyle+} dataset consists of artworks with corresponding textual descriptions for style and content. Based on the multimodal dataset, we propose a disentangled representations-guided diffusion model. The disentangled representations are first learned by Q-Formers and then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Video Analysis and Summarization
MethodsDiffusion
