TL;DR
CLIPGaussian is a universal, efficient style transfer framework that supports multimodal, text- and image-guided stylization for 2D, 3D, and 4D content, operating directly on Gaussian primitives without retraining.
Contribution
It introduces the first unified style transfer method for Gaussian Splatting that works across multiple modalities and integrates seamlessly into existing pipelines.
Findings
Achieves high style fidelity and consistency across diverse modalities.
Enables joint optimization of color and geometry in 3D and 4D.
Maintains model size while providing temporal coherence in videos.
Abstract
Gaussian Splatting (GS) has recently emerged as an efficient representation for rendering 3D scenes from 2D images and has been extended to images, videos, and dynamic 4D content. However, applying style transfer to GS-based representations, especially beyond simple color changes, remains challenging. In this work, we introduce CLIPGaussian, the first unified style transfer framework that supports text- and image-guided stylization across multiple modalities: 2D images, videos, 3D objects, and 4D scenes. Our method operates directly on Gaussian primitives and integrates into existing GS pipelines as a plug-in module, without requiring large generative models or retraining from scratch. The CLIPGaussian approach enables joint optimization of color and geometry in 3D and 4D settings, and achieves temporal coherence in videos, while preserving the model size. We demonstrate superior style…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
