Multimodal Representation Learning using Deep Multiset Canonical Correlation
Krishna Somandepalli, Naveen Kumar, Ruchir Travadi, Shrikanth, Narayanan

TL;DR
This paper introduces Deep Multiset Canonical Correlation Analysis (dMCCA), a deep learning method for extracting shared representations from multiple modalities without requiring class labels, effective even with noisy data.
Contribution
The paper presents a novel deep learning extension of multiset CCA that learns non-linear shared representations across multiple modalities without class supervision.
Findings
dMCCA effectively recovers common signals in synthetic noisy data
Outperforms other CCA-based methods on noisy handwritten datasets
Achieves comparable performance to end-to-end deep neural networks
Abstract
We propose Deep Multiset Canonical Correlation Analysis (dMCCA) as an extension to representation learning using CCA when the underlying signal is observed across multiple (more than two) modalities. We use deep learning framework to learn non-linear transformations from different modalities to a shared subspace such that the representations maximize the ratio of between- and within-modality covariance of the observations. Unlike linear discriminant analysis, we do not need class information to learn these representations, and we show that this model can be trained for complex data using mini-batches. Using synthetic data experiments, we show that dMCCA can effectively recover the common signal across the different modalities corrupted by multiplicative and additive noise. We also analyze the sensitivity of our model to recover the correlated components with respect to mini-batch size…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Speech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning
