Looking at words and points with attention: a benchmark for text-to-shape coherence
Andrea Amaduzzi, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano

TL;DR
This paper introduces a new benchmark for evaluating text-to-shape coherence in 3D object generation, addressing previous dataset quality issues and proposing a novel metric based on cross-attention, validated through user studies.
Contribution
It presents a refined dataset using large language models, a new quantitative coherence metric, and a comprehensive benchmark for text-to-shape 3D generation evaluation.
Findings
The refined dataset improves description quality.
The proposed metric correlates well with user judgments.
Benchmark facilitates future research in text-to-shape coherence.
Abstract
While text-conditional 3D object generation and manipulation have seen rapid progress, the evaluation of coherence between generated 3D shapes and input textual descriptions lacks a clear benchmark. The reason is twofold: a) the low quality of the textual descriptions in the only publicly available dataset of text-shape pairs; b) the limited effectiveness of the metrics used to quantitatively assess such coherence. In this paper, we propose a comprehensive solution that addresses both weaknesses. Firstly, we employ large language models to automatically refine textual descriptions associated with shapes. Secondly, we propose a quantitative metric to assess text-to-shape coherence, through cross-attention mechanisms. To validate our approach, we conduct a user study and compare quantitatively our metric with existing ones. The refined dataset, the new metric and a set of text-shape pairs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · 3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction
