SCA3D: Enhancing Cross-modal 3D Retrieval via 3D Shape and Caption Paired Data Augmentation
Junlong Ren, Hao Wu, Hui Xiong, Hao Wang

TL;DR
SCA3D introduces a novel data augmentation method using captioning and shape component generation to improve cross-modal 3D retrieval, significantly enhancing performance on benchmark datasets.
Contribution
The paper presents SCA3D, a new data augmentation approach that leverages captioning and component-based shape synthesis to address data scarcity in cross-modal 3D retrieval.
Findings
Outperforms previous methods on Text2Shape dataset
Raises RR@1 for Shape-to-Text from 20.03 to 27.22
Raises RR@1 for Text-to-Shape from 13.12 to 16.67
Abstract
The cross-modal 3D retrieval task aims to achieve mutual matching between text descriptions and 3D shapes. This has the potential to enhance the interaction between natural language and the 3D environment, especially within the realms of robotics and embodied artificial intelligence (AI) applications. However, the scarcity and expensiveness of 3D data constrain the performance of existing cross-modal 3D retrieval methods. These methods heavily rely on features derived from the limited number of 3D shapes, resulting in poor generalization ability across diverse scenarios. To address this challenge, we introduce SCA3D, a novel 3D shape and caption online data augmentation method for cross-modal 3D retrieval. Our approach uses the LLaVA model to create a component library, captioning each segmented part of every 3D shape within the dataset. Notably, it facilitates the generation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Multimodal Machine Learning Applications · Human Motion and Animation
MethodsALIGN
