TL;DR
This paper introduces BTW, a non-parametric framework that dynamically weights multiple modalities in multimodal models, improving performance without extra parameters by using instance-level divergence and global mutual information measures.
Contribution
The paper presents BTW, a novel scalable, parameter-free weighting method for multimodal learning that enhances model performance by adaptively balancing modalities during training.
Findings
Improves regression performance on sentiment analysis tasks.
Enhances classification accuracy in clinical multimodal datasets.
Scales effectively to multiple modalities without additional parameters.
Abstract
Mixture-of-Experts (MoE) models have become increasingly powerful in multimodal learning by enabling modular specialization across modalities. However, their effectiveness remains unclear when additional modalities introduce more noise than complementary information. Existing approaches, such as the Partial Information Decomposition, struggle to scale beyond two modalities and lack the resolution needed for instance-level control. We propose Beyond Two-modality Weighting (BTW), a bi-level, non-parametric weighting framework that combines instance-level Kullback-Leibler (KL) divergence and modality-level mutual information (MI) to dynamically adjust modality importance during training. Our method does not require additional parameters and can be applied to an arbitrary number of modalities. Specifically, BTW computes per-example KL weights by measuring the divergence between each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
