Towards Multi-View Consistent Style Transfer with One-Step Diffusion via Vision Conditioning
Yushen Zuo, Jun Xiao, Kin-Chung Chan, Rongkang Dong, Cuixin Yang,, Zongqi He, Hao Xie, Kin-Man Lam

TL;DR
This paper introduces OSDiffST, a novel style transfer method for 3D scenes that maintains multi-view consistency and structural integrity using a one-step diffusion model with vision conditioning.
Contribution
The paper proposes a new diffusion-based style transfer approach with a vision conditioning module and LoRA adaptation, specifically designed for multi-view 3D scene stylization.
Findings
Outperforms existing style transfer methods in multi-view consistency.
Produces stylized images with better structural preservation and less distortion.
Effectively adapts pre-trained diffusion models for small datasets.
Abstract
The stylization of 3D scenes is an increasingly attractive topic in 3D vision. Although image style transfer has been extensively researched with promising results, directly applying 2D style transfer methods to 3D scenes often fails to preserve the structural and multi-view properties of 3D environments, resulting in unpleasant distortions in images from different viewpoints. To address these issues, we leverage the remarkable generative prior of diffusion-based models and propose a novel style transfer method, OSDiffST, based on a pre-trained one-step diffusion model (i.e., SD-Turbo) for rendering diverse styles in multi-view images of 3D scenes. To efficiently adapt the pre-trained model for multi-view style transfer on small datasets, we introduce a vision condition module to extract style information from the reference style image to serve as conditional input for the diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Handwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques
MethodsDiffusion
