Generalized K-fan Multimodal Deep Model with Shared Representations
Gang Chen, Sargur N. Srihari

TL;DR
This paper introduces a generalized K-fan deep multimodal model that effectively handles multiple inputs and outputs, learning shared representations for diverse tasks like visual restoration and object recognition.
Contribution
It extends deep Boltzmann machines to a K-fan structure capable of multi-input and multi-output learning with shared representations and novel training objectives.
Findings
Effective multi-source information leveraging
Accurate multi-task predictions
Outperforms competitive baselines
Abstract
Multimodal learning with deep Boltzmann machines (DBMs) is an generative approach to fuse multimodal inputs, and can learn the shared representation via Contrastive Divergence (CD) for classification and information retrieval tasks. However, it is a 2-fan DBM model, and cannot effectively handle multiple prediction tasks. Moreover, this model cannot recover the hidden representations well by sampling from the conditional distribution when more than one modalities are missing. In this paper, we propose a K-fan deep structure model, which can handle the multi-input and muti-output learning problems effectively. In particular, the deep structure has K-branch for different inputs where each branch can be composed of a multi-layer deep model, and a shared representation is learned in an discriminative manner to tackle multimodal tasks. Given the deep structure, we propose two objective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Domain Adaptation and Few-Shot Learning
