Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding
Kohei Torimi, Ryosuke Yamada, Daichi Otsuka, Kensho Hara, Yuki M., Asano, Hirokatsu Kataoka, Yoshimitsu Aoki

TL;DR
This paper introduces TeGA, a text-guided geometric augmentation method that generates synthetic 3D data to improve zero-shot 3D classification, achieving state-of-the-art results with limited real data.
Contribution
TeGA is a novel approach that uses generative text-to-3D models and a filtering strategy to expand limited 3D datasets for zero-shot recognition.
Findings
Achieves 3.0% improvement on Objaverse-LVIS
Achieves 4.6% improvement on ScanObjectNN
Achieves 8.7% improvement on ModelNet40
Abstract
Zero-shot recognition models require extensive training data for generalization. However, in zero-shot 3D classification, collecting 3D data and captions is costly and laborintensive, posing a significant barrier compared to 2D vision. Recent advances in generative models have achieved unprecedented realism in synthetic data production, and recent research shows the potential for using generated data as training data. Here, naturally raising the question: Can synthetic 3D data generated by generative models be used as expanding limited 3D datasets? In response, we present a synthetic 3D dataset expansion method, Textguided Geometric Augmentation (TeGA). TeGA is tailored for language-image-3D pretraining, which achieves SoTA in zero-shot 3D classification, and uses a generative textto-3D model to enhance and extend limited 3D datasets. Specifically, we automatically generate text-guided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques
