SeMv-3D: Towards Concurrency of Semantic and Multi-view Consistency in General Text-to-3D Generation

Xiao Cai; Pengpeng Zeng; Lianli Gao; Sitong Su; Heng Tao Shen; Jingkuan Song

arXiv:2410.07658·cs.CV·May 22, 2025

SeMv-3D: Towards Concurrency of Semantic and Multi-view Consistency in General Text-to-3D Generation

Xiao Cai, Pengpeng Zeng, Lianli Gao, Sitong Su, Heng Tao Shen, Jingkuan Song

PDF

Open Access

TL;DR

SeMv-3D introduces a novel framework that jointly improves semantic alignment and multi-view consistency in text-to-3D generation, leveraging triplane priors and attention mechanisms for enhanced 3D content creation.

Contribution

The paper proposes SeMv-3D, a new method combining Triplane Prior Learning and Semantic Aligning techniques to simultaneously enhance semantic fidelity and multi-view consistency in GT23D.

Findings

01

Sets new state-of-the-art in multi-view consistency.

02

Maintains competitive semantic alignment performance.

03

Effectively balances semantic and geometric coherence.

Abstract

General Text-to-3D (GT23D) generation is crucial for creating diverse 3D content across objects and scenes, yet it faces two key challenges: 1) ensuring semantic consistency between input text and generated 3D models, and 2) maintaining multi-view consistency across different perspectives within 3D. Existing approaches typically address only one of these challenges, often leading to suboptimal results in semantic fidelity and structural coherence. To overcome these limitations, we propose SeMv-3D, a novel framework that jointly enhances semantic alignment and multi-view consistency in GT23D generation. At its core, we introduce Triplane Prior Learning (TPL), which effectively learns triplane priors by capturing spatial correspondences across three orthogonal planes using a dedicated Orthogonal Attention mechanism, thereby ensuring geometric consistency across viewpoints. Additionally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Image Processing and 3D Reconstruction · Computer Graphics and Visualization Techniques

MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Focus · Synthesizer · Diffusion