Dynamic Multimodal Fusion

Zihui Xue; Radu Marculescu

arXiv:2204.00102·cs.CV·April 10, 2023

Dynamic Multimodal Fusion

Zihui Xue, Radu Marculescu

PDF

Open Access 1 Repo

TL;DR

This paper introduces DynMM, a dynamic multimodal fusion method that adaptively processes data during inference, significantly reducing computation costs while maintaining or improving performance across various multimodal tasks.

Contribution

The paper presents a novel dynamic fusion approach with a gating mechanism and resource-aware loss, enabling adaptive computation based on multimodal data characteristics.

Findings

01

Reduces computation costs by 46.5% in sentiment analysis.

02

Improves segmentation performance with 21% savings in computation.

03

Demonstrates wide applicability across multimodal tasks.

Abstract

Deep multimodal learning has achieved great progress in recent years. However, current fusion approaches are static in nature, i.e., they process and fuse multimodal inputs with identical computation, without accounting for diverse computational demands of different multimodal data. In this work, we propose dynamic multimodal fusion (DynMM), a new approach that adaptively fuses multimodal data and generates data-dependent forward paths during inference. To this end, we propose a gating function to provide modality-level or fusion-level decisions on-the-fly based on multimodal features and a resource-aware loss function that encourages computational efficiency. Results on various multimodal tasks demonstrate the efficiency and wide applicability of our approach. For instance, DynMM can reduce the computation costs by 46.5% with only a negligible accuracy loss (CMU-MOSEI sentiment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zihuixue/dynmm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Rough Sets and Fuzzy Logic · Domain Adaptation and Few-Shot Learning