Structural Energy-Guided Sampling for View-Consistent Text-to-3D
Qing Zhang, Jinguang Tong, Jie Hong, Jing Zhang, Xuesong Li

TL;DR
This paper introduces SEGS, a training-free method that improves view consistency in text-to-3D generation by guiding sampling with structural energy, reducing artifacts without retraining.
Contribution
SEGS is a novel, plug-and-play framework that enforces multi-view consistency during sampling by using structural energy in feature space, addressing viewpoint bias in text-to-3D models.
Findings
Reduces Janus artifacts in text-to-3D generation.
Improves geometric alignment and viewpoint consistency.
Does not require retraining or weight modifications.
Abstract
Text-to-3D generation often suffers from the Janus problem, where objects look correct from the front but collapse into duplicated or distorted geometry from other angles. We attribute this failure to viewpoint bias in 2D diffusion priors, which propagates into 3D optimization. To address this, we propose Structural Energy-Guided Sampling (SEGS), a training-free, plug-and-play framework that enforces multi-view consistency entirely at sampling time. SEGS defines a structural energy in a PCA subspace of intermediate U-Net features and injects its gradients into the denoising trajectory, steering geometry toward the intended viewpoint while preserving appearance fidelity. Integrated seamlessly into SDS/VSD pipelines, SEGS significantly reduces Janus artifacts, achieving improved geometric alignment and viewpoint consistency without retraining or weight modification.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
