Self-Supervised Generative-Contrastive Learning of Multi-Modal Euclidean Input for 3D Shape Latent Representations: A Dynamic Switching Approach

Chengzhi Wu; Julius Pfrommer; Mingyuan Zhou; J\"urgen Beyerer

arXiv:2301.04612·cs.CV·June 9, 2025·1 cites

Self-Supervised Generative-Contrastive Learning of Multi-Modal Euclidean Input for 3D Shape Latent Representations: A Dynamic Switching Approach

Chengzhi Wu, Julius Pfrommer, Mingyuan Zhou, J\"urgen Beyerer

PDF

Open Access

TL;DR

This paper introduces a self-supervised learning framework combining generative and contrastive methods with a dynamic switching approach to learn effective 3D shape representations from multi-modal data, improving reconstruction and classification.

Contribution

It presents a novel dynamic switching training strategy for multi-modal 3D shape encoding that prevents collapse and enhances feature integration from different data modalities.

Findings

01

Improved 3D shape reconstruction accuracy.

02

Enhanced classification performance using learned latent representations.

03

Effective multi-modal data integration in self-supervised learning.

Abstract

We propose a combined generative and contrastive neural architecture for learning latent representations of 3D volumetric shapes. The architecture uses two encoder branches for voxel grids and multi-view images from the same underlying shape. The main idea is to combine a contrastive loss between the resulting latent representations with an additional reconstruction loss. That helps to avoid collapsing the latent representations as a trivial solution for minimizing the contrastive loss. A novel dynamic switching approach is used to cross-train two encoders with a shared decoder. The switching approach also enables the stop gradient operation on a random branch. Further classification experiments show that the latent representations learned with our self-supervised method integrate more useful information from the additional input data implicitly, thus leading to better reconstruction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · Advanced Vision and Imaging