Looking at words and points with attention: a benchmark for   text-to-shape coherence

Andrea Amaduzzi; Giuseppe Lisanti; Samuele Salti; Luigi Di Stefano

arXiv:2309.07917·cs.CV·September 15, 2023

Looking at words and points with attention: a benchmark for text-to-shape coherence

Andrea Amaduzzi, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano

PDF

Open Access

TL;DR

This paper introduces a new benchmark for evaluating text-to-shape coherence in 3D object generation, addressing previous dataset quality issues and proposing a novel metric based on cross-attention, validated through user studies.

Contribution

It presents a refined dataset using large language models, a new quantitative coherence metric, and a comprehensive benchmark for text-to-shape 3D generation evaluation.

Findings

01

The refined dataset improves description quality.

02

The proposed metric correlates well with user judgments.

03

Benchmark facilitates future research in text-to-shape coherence.

Abstract

While text-conditional 3D object generation and manipulation have seen rapid progress, the evaluation of coherence between generated 3D shapes and input textual descriptions lacks a clear benchmark. The reason is twofold: a) the low quality of the textual descriptions in the only publicly available dataset of text-shape pairs; b) the limited effectiveness of the metrics used to quantitatively assess such coherence. In this paper, we propose a comprehensive solution that addresses both weaknesses. Firstly, we employ large language models to automatically refine textual descriptions associated with shapes. Secondly, we propose a quantitative metric to assess text-to-shape coherence, through cross-attention mechanisms. To validate our approach, we conduct a user study and compare quantitatively our metric with existing ones. The refined dataset, the new metric and a set of text-shape pairs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · 3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction