Structural Energy Guidance for View-Consistent Text-to-3D Generation

Qing Zhang; Jinguang Tong; Jing Zhang; Jie Hong; Xuesong Li

arXiv:2605.19876·cs.CV·May 20, 2026

Structural Energy Guidance for View-Consistent Text-to-3D Generation

Qing Zhang, Jinguang Tong, Jing Zhang, Jie Hong, Xuesong Li

PDF

TL;DR

This paper introduces SEGS, a training-free framework that enhances multi-view consistency in text-to-3D diffusion models by guiding the denoising process with structural energy, reducing viewpoint artifacts.

Contribution

It proposes a novel structural energy-guided sampling method that is plug-and-play and improves multi-view consistency without retraining.

Findings

01

Reduces Janus Rate by about 10% on average.

02

Improves View-CS scores across multiple baselines.

03

Effectively alleviates viewpoint artifacts while maintaining appearance fidelity.

Abstract

Text-to-3D generation based on diffusion models often suffers from the Janus problem, leading to inconsistent geometry across viewpoints. This work identifies viewpoint bias in 2D diffusion priors as the main cause and proposes Structural Energy-Guided Sampling (SEGS), a training-free and plug-and-play framework to improve multi-view consistency. SEGS constructs a structural energy in the PCA subspace of U-Net features and injects its gradient into the denoising process. It can be easily integrated into SDS/VSD pipelines without retraining. Experiments show that SEGS reduces the Janus Rate by about 10% on average and improves View-CS scores across multiple baselines, including DreamFusion, Magic3D, and LucidDreamer. This method effectively alleviates viewpoint artifacts while preserving appearance fidelity, providing a flexible solution for high-quality text-to-3D content generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.