GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark

Xiao Cai; Sitong Su; Jingkuan Song; Pengpeng Zeng; Ji Zhang; Qinhong Du; Mengqi Li; Heng Tao Shen; Lianli Gao

arXiv:2412.09997·cs.CV·December 4, 2025

GT23D-Bench: A Comprehensive General Text-to-3D Generation Benchmark

Xiao Cai, Sitong Su, Jingkuan Song, Pengpeng Zeng, Ji Zhang, Qinhong Du, Mengqi Li, Heng Tao Shen, Lianli Gao

PDF

Open Access

TL;DR

This paper introduces GT23D-Bench, a comprehensive benchmark for general text-to-3D generation, including a large-scale dataset and evaluation metrics that better assess 3D quality and semantic alignment.

Contribution

The paper presents the first dedicated benchmark for GT23D, with a large dataset and multi-metric evaluation suite to improve model training and assessment.

Findings

01

Proposed a dataset with 400K 3D assets and 70M visual samples.

02

Developed 10 evaluation metrics with higher correlation to human judgment.

03

Analyzed eight GT23D models to reveal current capabilities and failure modes.

Abstract

Text-to-3D (T23D) generation has emerged as a crucial visual generation task, aiming at synthesizing 3D content from textual descriptions. Studies of this task are currently shifting from per-scene T23D, which requires optimization of the model for every content generated, to General T23D (GT23D), which requires only one pre-trained model to generate different content without re-optimization, for more generalized and efficient 3D generation. Despite notable advancements, GT23D is severely bottlenecked by two interconnected challenges: the lack of high-quality, large-scale training data and the prevalence of evaluation metrics that overlook intrinsic 3D properties. Existing datasets often suffer from incomplete annotations, noisy organization, and inconsistent quality, while current evaluations rely heavily on 2D image-text similarity or scoring, failing to thoroughly assess 3D geometric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques