Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments
Abhinav Joshi, Naman Gupta, Jinang Shah, Binod Bhattarai and, Ashutosh Modi, Danail Stoyanov

TL;DR
This paper introduces a generalized product-of-experts approach for robust multimodal representation learning in noisy environments, dynamically weighting modalities to improve performance on challenging benchmarks.
Contribution
It proposes a novel method that trains separate networks for each modality to assess credibility and dynamically combines them, enhancing robustness in noisy multimodal data.
Findings
Achieved state-of-the-art results on 3D hand-pose estimation benchmark.
Outperformed previous methods in multimodal surgical video segmentation.
Demonstrated robustness to noisy data through extensive evaluations.
Abstract
A real-world application or setting involves interaction between different modalities (e.g., video, speech, text). In order to process the multimodal information automatically and use it for an end application, Multimodal Representation Learning (MRL) has emerged as an active area of research in recent times. MRL involves learning reliable and robust representations of information from heterogeneous sources and fusing them. However, in practice, the data acquired from different sources are typically noisy. In some extreme cases, a noise of large magnitude can completely alter the semantics of the data leading to inconsistencies in the parallel multimodal data. In this paper, we propose a novel method for multimodal representation learning in a noisy environment via the generalized product of experts technique. In the proposed method, we train a separate network for each modality to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
