BlendCLIP: Bridging Synthetic and Real Domains for Zero-Shot 3D Object Classification with Multimodal Pretraining
Ajinkya Khoche, Gerg\H{o} L\'aszl\'o Nagy, Maciej Wozniak, Thomas Gustafsson, Patric Jensfelt

TL;DR
BlendCLIP introduces a multimodal pretraining framework that effectively bridges the synthetic-to-real domain gap in zero-shot 3D object classification, significantly improving outdoor scene recognition with minimal real-world data.
Contribution
It proposes a curriculum-based data mixing strategy and a large-scale dataset of multimodal triplets to enhance domain adaptation in zero-shot 3D classification.
Findings
Boosts zero-shot accuracy on nuScenes by 27% with minimal real data
Achieves 19.3% improvement over prior methods on nuScenes
Maintains strong generalization across synthetic and real datasets
Abstract
Zero-shot 3D object classification is crucial for real-world applications like autonomous driving, however it is often hindered by a significant domain gap between the synthetic data used for training and the sparse, noisy LiDAR scans encountered in the real-world. Current methods trained solely on synthetic data fail to generalize to outdoor scenes, while those trained only on real data lack the semantic diversity to recognize rare or unseen objects. We introduce BlendCLIP, a multimodal pretraining framework that bridges this synthetic-to-real gap by strategically combining the strengths of both domains. We first propose a pipeline to generate a large-scale dataset of object-level triplets -- consisting of a point cloud, image, and text description -- mined directly from real-world driving data and human annotated 3D boxes. Our core contribution is a curriculum-based data mixing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Domain Adaptation and Few-Shot Learning
