Multimodal Co-learning: Challenges, Applications with Datasets, Recent   Advances and Future Directions

Anil Rahate; Rahee Walambe; Sheela Ramanna; Ketan Kotecha

arXiv:2107.13782·cs.LG·January 19, 2022

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions

Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha

PDF

TL;DR

This paper provides a comprehensive survey of multimodal co-learning, addressing its challenges, recent advances, datasets, applications, and future directions in the context of multimodal deep learning systems.

Contribution

It offers the first detailed taxonomy and review of multimodal co-learning challenges, techniques, and applications, highlighting future research directions.

Findings

01

Identified key challenges in multimodal co-learning.

02

Reviewed recent techniques and datasets for co-learning.

03

Outlined future research directions in the field.

Abstract

Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves multiple aspects: representation, translation, alignment, fusion, and co-learning. In the current state of multimodal machine learning, the assumptions are that all modalities are present, aligned, and noiseless during training and testing time. However, in real-world tasks, typically, it is observed that one or more modalities are missing, noisy, lacking annotated data, have unreliable labels, and are scarce in training or testing and or both. This challenge is addressed by a learning paradigm called multimodal co-learning. The modeling of a (resource-poor) modality is aided by exploiting knowledge from another (resource-rich) modality using transfer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.