SCA3D: Enhancing Cross-modal 3D Retrieval via 3D Shape and Caption   Paired Data Augmentation

Junlong Ren; Hao Wu; Hui Xiong; Hao Wang

arXiv:2502.19128·cs.CV·February 27, 2025

SCA3D: Enhancing Cross-modal 3D Retrieval via 3D Shape and Caption Paired Data Augmentation

Junlong Ren, Hao Wu, Hui Xiong, Hao Wang

PDF

Open Access 1 Repo

TL;DR

SCA3D introduces a novel data augmentation method using captioning and shape component generation to improve cross-modal 3D retrieval, significantly enhancing performance on benchmark datasets.

Contribution

The paper presents SCA3D, a new data augmentation approach that leverages captioning and component-based shape synthesis to address data scarcity in cross-modal 3D retrieval.

Findings

01

Outperforms previous methods on Text2Shape dataset

02

Raises RR@1 for Shape-to-Text from 20.03 to 27.22

03

Raises RR@1 for Text-to-Shape from 13.12 to 16.67

Abstract

The cross-modal 3D retrieval task aims to achieve mutual matching between text descriptions and 3D shapes. This has the potential to enhance the interaction between natural language and the 3D environment, especially within the realms of robotics and embodied artificial intelligence (AI) applications. However, the scarcity and expensiveness of 3D data constrain the performance of existing cross-modal 3D retrieval methods. These methods heavily rely on features derived from the limited number of 3D shapes, resulting in poor generalization ability across diverse scenarios. To address this challenge, we introduce SCA3D, a novel 3D shape and caption online data augmentation method for cross-modal 3D retrieval. Our approach uses the LLaVA model to create a component library, captioning each segmented part of every 3D shape within the dataset. Notably, it facilitates the generation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

3dagentworld/sca3d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Multimodal Machine Learning Applications · Human Motion and Animation

MethodsALIGN