Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions
Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha

TL;DR
This paper provides a comprehensive survey of multimodal co-learning, addressing its challenges, recent advances, datasets, applications, and future directions in the context of multimodal deep learning systems.
Contribution
It offers the first detailed taxonomy and review of multimodal co-learning challenges, techniques, and applications, highlighting future research directions.
Findings
Identified key challenges in multimodal co-learning.
Reviewed recent techniques and datasets for co-learning.
Outlined future research directions in the field.
Abstract
Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems. Multimodal machine learning involves multiple aspects: representation, translation, alignment, fusion, and co-learning. In the current state of multimodal machine learning, the assumptions are that all modalities are present, aligned, and noiseless during training and testing time. However, in real-world tasks, typically, it is observed that one or more modalities are missing, noisy, lacking annotated data, have unreliable labels, and are scarce in training or testing and or both. This challenge is addressed by a learning paradigm called multimodal co-learning. The modeling of a (resource-poor) modality is aided by exploiting knowledge from another (resource-rich) modality using transfer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
