Text-guided Synthetic Geometric Augmentation for Zero-shot 3D   Understanding

Kohei Torimi; Ryosuke Yamada; Daichi Otsuka; Kensho Hara; Yuki M.; Asano; Hirokatsu Kataoka; Yoshimitsu Aoki

arXiv:2501.09278·cs.CV·January 20, 2025

Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding

Kohei Torimi, Ryosuke Yamada, Daichi Otsuka, Kensho Hara, Yuki M., Asano, Hirokatsu Kataoka, Yoshimitsu Aoki

PDF

Open Access

TL;DR

This paper introduces TeGA, a text-guided geometric augmentation method that generates synthetic 3D data to improve zero-shot 3D classification, achieving state-of-the-art results with limited real data.

Contribution

TeGA is a novel approach that uses generative text-to-3D models and a filtering strategy to expand limited 3D datasets for zero-shot recognition.

Findings

01

Achieves 3.0% improvement on Objaverse-LVIS

02

Achieves 4.6% improvement on ScanObjectNN

03

Achieves 8.7% improvement on ModelNet40

Abstract

Zero-shot recognition models require extensive training data for generalization. However, in zero-shot 3D classification, collecting 3D data and captions is costly and laborintensive, posing a significant barrier compared to 2D vision. Recent advances in generative models have achieved unprecedented realism in synthetic data production, and recent research shows the potential for using generated data as training data. Here, naturally raising the question: Can synthetic 3D data generated by generative models be used as expanding limited 3D datasets? In response, we present a synthetic 3D dataset expansion method, Textguided Geometric Augmentation (TeGA). TeGA is tailored for language-image-3D pretraining, which achieves SoTA in zero-shot 3D classification, and uses a generative textto-3D model to enhance and extend limited 3D datasets. Specifically, we automatically generate text-guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques