Dynamic Multimodal Fusion
Zihui Xue, Radu Marculescu

TL;DR
This paper introduces DynMM, a dynamic multimodal fusion method that adaptively processes data during inference, significantly reducing computation costs while maintaining or improving performance across various multimodal tasks.
Contribution
The paper presents a novel dynamic fusion approach with a gating mechanism and resource-aware loss, enabling adaptive computation based on multimodal data characteristics.
Findings
Reduces computation costs by 46.5% in sentiment analysis.
Improves segmentation performance with 21% savings in computation.
Demonstrates wide applicability across multimodal tasks.
Abstract
Deep multimodal learning has achieved great progress in recent years. However, current fusion approaches are static in nature, i.e., they process and fuse multimodal inputs with identical computation, without accounting for diverse computational demands of different multimodal data. In this work, we propose dynamic multimodal fusion (DynMM), a new approach that adaptively fuses multimodal data and generates data-dependent forward paths during inference. To this end, we propose a gating function to provide modality-level or fusion-level decisions on-the-fly based on multimodal features and a resource-aware loss function that encourages computational efficiency. Results on various multimodal tasks demonstrate the efficiency and wide applicability of our approach. For instance, DynMM can reduce the computation costs by 46.5% with only a negligible accuracy loss (CMU-MOSEI sentiment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Rough Sets and Fuzzy Logic · Domain Adaptation and Few-Shot Learning
