MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Yake Wei; Di Hu

arXiv:2405.17730·cs.CV·May 29, 2024·5 cites

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Yake Wei, Di Hu

PDF

Open Access 1 Repo

TL;DR

This paper introduces MMPareto, a novel algorithm that mitigates gradient conflicts in multimodal learning by ensuring aligned gradient directions and magnitudes, leading to improved generalization across various modalities and tasks.

Contribution

We propose MMPareto, an algorithm that addresses gradient conflicts in multimodal learning by analyzing loss discrepancies and applying Pareto integration for better gradient alignment.

Findings

01

MMPareto improves performance across multiple modalities.

02

The method enhances generalization and scalability.

03

Experiments confirm superior results over existing approaches.

Abstract

Multimodal learning methods with targeted unimodal learning objectives have exhibited their superior efficacy in alleviating the imbalanced multimodal learning problem. However, in this paper, we identify the previously ignored gradient conflict between multimodal and unimodal learning objectives, potentially misleading the unimodal encoder optimization. To well diminish these conflicts, we observe the discrepancy between multimodal loss and unimodal loss, where both gradient magnitude and covariance of the easier-to-learn multimodal loss are smaller than the unimodal one. With this property, we analyze Pareto integration under our multimodal scenario and propose MMPareto algorithm, which could ensure a final gradient with direction that is common to all learning objectives and enhanced magnitude to improve generalization, providing innocent unimodal assistance. Finally, experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gewu-lab/mmpareto_icml2024
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems