GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark
Xiao Cai, Sitong Su, Jingkuan Song, Pengpeng Zeng, Ji Zhang, Qinhong Du, Mengqi Li, Heng Tao Shen, Lianli Gao

TL;DR
This paper introduces GT23D-Bench, a comprehensive benchmark for general text-to-3D generation, including a large-scale dataset and evaluation metrics that better assess 3D quality and semantic alignment.
Contribution
The paper presents the first dedicated benchmark for GT23D, with a large dataset and multi-metric evaluation suite to improve model training and assessment.
Findings
Proposed a dataset with 400K 3D assets and 70M visual samples.
Developed 10 evaluation metrics with higher correlation to human judgment.
Analyzed eight GT23D models to reveal current capabilities and failure modes.
Abstract
Text-to-3D (T23D) generation has emerged as a crucial visual generation task, aiming at synthesizing 3D content from textual descriptions. Studies of this task are currently shifting from per-scene T23D, which requires optimization of the model for every content generated, to General T23D (GT23D), which requires only one pre-trained model to generate different content without re-optimization, for more generalized and efficient 3D generation. Despite notable advancements, GT23D is severely bottlenecked by two interconnected challenges: the lack of high-quality, large-scale training data and the prevalence of evaluation metrics that overlook intrinsic 3D properties. Existing datasets often suffer from incomplete annotations, noisy organization, and inconsistent quality, while current evaluations rely heavily on 2D image-text similarity or scoring, failing to thoroughly assess 3D geometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
