A Review on Methods and Applications in Multimodal Deep Learning

Jabeen Summaira; Xi Li; Amin Muhammad Shoib; Jabbar Abdul

arXiv:2202.09195·cs.LG·February 21, 2022·1 cites

A Review on Methods and Applications in Multimodal Deep Learning

Jabeen Summaira, Xi Li, Amin Muhammad Shoib, Jabbar Abdul

PDF

Open Access

TL;DR

This paper reviews recent advancements in multimodal deep learning, analyzing various modalities like image, text, and audio, and discusses baseline methods, applications, challenges, and future research directions.

Contribution

It provides a comprehensive taxonomy of multimodal deep learning methods and a detailed analysis of recent developments from 2017 to 2021.

Findings

01

Detailed taxonomy of multimodal methods

02

Analysis of recent advancements (2017-2021)

03

Identification of key challenges and future directions

Abstract

Deep Learning has implemented a wide range of applications and has become increasingly popular in recent years. The goal of multimodal deep learning (MMDL) is to create models that can process and link information using various modalities. Despite the extensive development made for unimodal learning, it still cannot cover all the aspects of human learning. Multimodal learning helps to understand and analyze better when various senses are engaged in the processing of information. This paper focuses on multiple types of modalities, i.e., image, video, text, audio, body gestures, facial expressions, and physiological signals. Detailed analysis of the baseline approaches and an in-depth study of recent advancements during the last five years (2017 to 2021) in multimodal deep learning applications has been provided. A fine-grained taxonomy of various multimodal deep learning methods is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems