Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion
Yang Wang

TL;DR
This survey reviews the evolution of multi-modal data analytics from shallow to deep learning approaches, emphasizing collaboration, rivalry, and fusion strategies, and discusses future research directions.
Contribution
It provides a comprehensive overview of multi-modal data analytics, highlighting the transition to deep neural networks and key components like collaboration and rivalry.
Findings
Deep multi-modal methods improve data fusion performance.
Deep neural networks effectively capture nonlinear multi-modal data distributions.
Future directions include advanced fusion and adversarial techniques.
Abstract
With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each other. Such fact motivated a lot of research attention on fusing the multi-modal feature spaces to comprehensively characterize the data objects. Most of the existing state-of-the-art focused on how to fuse the energy or information from multi-modal spaces to deliver a superior performance over their counterparts with single modal. Recently, deep neural networks have exhibited as a powerful architecture to well capture the nonlinear distribution of high-dimensional multimedia data, so naturally does for multi-modal data. Substantial empirical studies are carried out to demonstrate its advantages that are benefited from deep multi-modal methods, which can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection
