Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process
Tsai Hor Chan, Feng Wu, Yihang Chen, Guosheng Yin, Lequan Yu

TL;DR
This paper introduces a Bayesian non-parametric framework using the Dirichlet process to enhance multimodal learning by dynamically emphasizing prominent features within each modality, improving fusion and representation quality.
Contribution
The proposed DP-driven framework uniquely balances intra-modal feature prominence and cross-modal alignment, advancing multimodal fusion techniques with adaptive feature weighting.
Findings
Outperforms existing multimodal models on several datasets.
Demonstrates robustness to hyperparameter variations.
Validates effectiveness through ablation studies.
Abstract
Developing effective multimodal fusion approaches has become increasingly essential in many real-world scenarios, such as health care and finance. The key challenge is how to preserve the feature expressiveness in each modality while learning cross-modal interactions. Previous approaches primarily focus on the cross-modal alignment, while over-emphasis on the alignment of marginal distributions of modalities may impose excess regularization and obstruct meaningful representations within each modality. The Dirichlet process (DP) mixture model is a powerful Bayesian non-parametric method that can amplify the most prominent features by its richer-gets-richer property, which allocates increasing weights to them. Inspired by this unique characteristic of DP, we propose a new DP-driven multimodal learning framework that automatically achieves an optimal balance between prominent intra-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis · Bayesian Methods and Mixture Models
