Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process

Tsai Hor Chan; Feng Wu; Yihang Chen; Guosheng Yin; Lequan Yu

arXiv:2510.20736·cs.LG·October 24, 2025

Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process

Tsai Hor Chan, Feng Wu, Yihang Chen, Guosheng Yin, Lequan Yu

PDF

Open Access 1 Video

TL;DR

This paper introduces a Bayesian non-parametric framework using the Dirichlet process to enhance multimodal learning by dynamically emphasizing prominent features within each modality, improving fusion and representation quality.

Contribution

The proposed DP-driven framework uniquely balances intra-modal feature prominence and cross-modal alignment, advancing multimodal fusion techniques with adaptive feature weighting.

Findings

01

Outperforms existing multimodal models on several datasets.

02

Demonstrates robustness to hyperparameter variations.

03

Validates effectiveness through ablation studies.

Abstract

Developing effective multimodal fusion approaches has become increasingly essential in many real-world scenarios, such as health care and finance. The key challenge is how to preserve the feature expressiveness in each modality while learning cross-modal interactions. Previous approaches primarily focus on the cross-modal alignment, while over-emphasis on the alignment of marginal distributions of modalities may impose excess regularization and obstruct meaningful representations within each modality. The Dirichlet process (DP) mixture model is a powerful Bayesian non-parametric method that can amplify the most prominent features by its richer-gets-richer property, which allocates increasing weights to them. Inspired by this unique characteristic of DP, we propose a new DP-driven multimodal learning framework that automatically achieves an optimal balance between prominent intra-modal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process· slideslive

Taxonomy

TopicsMachine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis · Bayesian Methods and Mixture Models