TL;DR
CoCoDiff is a training-free, diffusion-based style transfer method that achieves fine-grained, semantically consistent stylization by leveraging pixel-wise correspondence and cycle consistency.
Contribution
It introduces a novel, training-free framework that uses pretrained diffusion models for detailed, semantically aligned style transfer without extra supervision.
Findings
Outperforms existing methods in visual quality and quantitative metrics.
Achieves object and region-level stylization while preserving geometry and details.
Operates without additional training or annotations.
Abstract
Transferring visual style between images while preserving semantic correspondence between similar objects remains a central challenge in computer vision. While existing methods have made great strides, most of them operate at global level but overlook region-wise and even pixel-wise semantic correspondence. To address this, we propose CoCoDiff, a novel training-free and low-cost style transfer framework that leverages pretrained latent diffusion models to achieve fine-grained, semantically consistent stylization. We identify that correspondence cues within generative diffusion models are under-explored and that content consistency across semantically matched regions is often neglected. CoCoDiff introduces a pixel-wise semantic correspondence module that mines intermediate diffusion features to construct a dense alignment map between content and style images. Furthermore, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
