SIC3D: Style Image Conditioned Text-to-3D Gaussian Splatting Generation

Ming He; Zhixiang Chen; Steve Maddock

arXiv:2604.08760·cs.CV·April 13, 2026

SIC3D: Style Image Conditioned Text-to-3D Gaussian Splatting Generation

Ming He, Zhixiang Chen, Steve Maddock

PDF

TL;DR

SIC3D is a novel controllable text-to-3D generation pipeline that incorporates style transfer from images, improving geometric fidelity and style adherence in 3D object synthesis.

Contribution

It introduces a two-stage process with a new VSSD loss for effective style transfer in 3D Gaussian Splatting, addressing controllability and texture ambiguity issues.

Findings

01

Outperforms prior methods in qualitative evaluations.

02

Enhances geometric fidelity and style adherence.

03

Effectively captures global and local texture patterns.

Abstract

Recent progress in text-to-3D object generation enables the synthesis of detailed geometry from text input by leveraging 2D diffusion models and differentiable 3D representations. However, the approaches often suffer from limited controllability and texture ambiguity due to the limitation of the text modality. To address this, we present SIC3D, a controllable image-conditioned text-to-3D generation pipeline with 3D Gaussian Splatting (3DGS). There are two stages in SIC3D. The first stage generates the 3D object content from text with a text-to-3DGS generation model. The second stage transfers style from a reference image to the 3DGS. Within this stylization stage, we introduce a novel Variational Stylized Score Distillation (VSSD) loss to effectively capture both global and local texture patterns while mitigating conflicts between geometry and appearance. A scaling regularization is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.