SeMv-3D: Towards Concurrency of Semantic and Multi-view Consistency in General Text-to-3D Generation
Xiao Cai, Pengpeng Zeng, Lianli Gao, Sitong Su, Heng Tao Shen, Jingkuan Song

TL;DR
SeMv-3D introduces a novel framework that jointly improves semantic alignment and multi-view consistency in text-to-3D generation, leveraging triplane priors and attention mechanisms for enhanced 3D content creation.
Contribution
The paper proposes SeMv-3D, a new method combining Triplane Prior Learning and Semantic Aligning techniques to simultaneously enhance semantic fidelity and multi-view consistency in GT23D.
Findings
Sets new state-of-the-art in multi-view consistency.
Maintains competitive semantic alignment performance.
Effectively balances semantic and geometric coherence.
Abstract
General Text-to-3D (GT23D) generation is crucial for creating diverse 3D content across objects and scenes, yet it faces two key challenges: 1) ensuring semantic consistency between input text and generated 3D models, and 2) maintaining multi-view consistency across different perspectives within 3D. Existing approaches typically address only one of these challenges, often leading to suboptimal results in semantic fidelity and structural coherence. To overcome these limitations, we propose SeMv-3D, a novel framework that jointly enhances semantic alignment and multi-view consistency in GT23D generation. At its core, we introduce Triplane Prior Learning (TPL), which effectively learns triplane priors by capturing spatial correspondences across three orthogonal planes using a dedicated Orthogonal Attention mechanism, thereby ensuring geometric consistency across viewpoints. Additionally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Image Processing and 3D Reconstruction · Computer Graphics and Visualization Techniques
MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Focus · Synthesizer · Diffusion
