Hierarchical Contrastive Learning for Multimodal Data
Huichao Li, Junhan Yu, Doudou Zhou

TL;DR
This paper introduces Hierarchical Contrastive Learning (HCL), a novel framework for multimodal data that captures shared, partially shared, and modality-specific information, improving representation quality and predictive performance.
Contribution
HCL unifies hierarchical latent-variable modeling with contrastive learning, enabling accurate recovery of complex shared structures in multimodal data.
Findings
HCL accurately recovers hierarchical structures in simulations.
HCL improves predictive performance on multimodal health records.
Theoretical guarantees established for identifiability and estimation.
Abstract
Multimodal representation learning is commonly built on a shared-private decomposition, treating latent information as either common to all modalities or specific to one. This binary view is often inadequate: many factors are shared by only subsets of modalities, and ignoring such partial sharing can over-align unrelated signals and obscure complementary information. We propose Hierarchical Contrastive Learning (HCL), a framework that learns globally shared, partially shared, and modality-specific representations within a unified model. HCL combines a hierarchical latent-variable formulation with structural sparsity and a structure-aware contrastive objective that aligns only modalities that genuinely share a latent factor. Under uncorrelated latent variables, we prove identifiability of the hierarchical decomposition, establish recovery guarantees for the loading matrices, and derive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
