MM-Mixing: Multi-Modal Mixing Alignment for 3D Understanding
Jiaze Wang, Yi Wang, Ziyu Guo, Renrui Zhang, Donghao Zhou, Guangyong, Chen, Anfeng Liu, Pheng-Ann Heng

TL;DR
MM-Mixing introduces a multi-modal mixing alignment framework that enhances 3D understanding by combining feature and input-level mixing, significantly improving zero-shot classification and cross-modal retrieval performance.
Contribution
This paper presents a novel multi-modal mixing alignment method with a two-stage training pipeline for improved 3D understanding and cross-modal alignment.
Findings
Zero-shot classification accuracy improved from 51.3% to 61.9% on ScanObjectNN.
Cross-modal retrieval performance increased notably.
Method is straightforward to implement and enhances generalization.
Abstract
We introduce MM-Mixing, a multi-modal mixing alignment framework for 3D understanding. MM-Mixing applies mixing-based methods to multi-modal data, preserving and optimizing cross-modal connections while enhancing diversity and improving alignment across modalities. Our proposed two-stage training pipeline combines feature-level and input-level mixing to optimize the 3D encoder. The first stage employs feature-level mixing with contrastive learning to align 3D features with their corresponding modalities. The second stage incorporates both feature-level and input-level mixing, introducing mixed point cloud inputs to further refine 3D feature representations. MM-Mixing enhances intermodality relationships, promotes generalization, and ensures feature consistency while providing diverse and realistic training samples. We demonstrate that MM-Mixing significantly improves baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage · 3D Shape Modeling and Analysis
MethodsALIGN · Contrastive Learning
