Exploring Multi-Modality Dynamics: Insights and Challenges in Multimodal Fusion for Biomedical Tasks
Laura Wenderoth

TL;DR
This paper critically examines the MM dynamics approach for multimodal fusion in biomedical classification, highlighting limitations, extending it to image data, and providing insights into feature and modality informativeness.
Contribution
It identifies challenges in MM dynamics, extends feature informativeness to images, and offers a nuanced understanding of multimodal fusion dynamics in biomedical tasks.
Findings
Feature informativeness improves performance and explainability.
Modality informativeness does not significantly enhance results.
Extended feature informativeness to images showed promising qualitative results.
Abstract
This paper investigates the MM dynamics approach proposed by Han et al. (2022) for multi-modal fusion in biomedical classification tasks. The MM dynamics algorithm integrates feature-level and modality-level informativeness to dynamically fuse modalities for improved classification performance. However, our analysis reveals several limitations and challenges in replicating and extending the results of MM dynamics. We found that feature informativeness improves performance and explainability, while modality informativeness does not provide significant advantages and can lead to performance degradation. Based on these results, we have extended feature informativeness to image data, resulting in the development of Image MM dynamics. Although this approach showed promising qualitative results, it did not outperform baseline methods quantitatively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
