Generalized K-fan Multimodal Deep Model with Shared Representations

Gang Chen; Sargur N. Srihari

arXiv:1503.07906·cs.LG·March 30, 2015·5 cites

Generalized K-fan Multimodal Deep Model with Shared Representations

Gang Chen, Sargur N. Srihari

PDF

Open Access

TL;DR

This paper introduces a generalized K-fan deep multimodal model that effectively handles multiple inputs and outputs, learning shared representations for diverse tasks like visual restoration and object recognition.

Contribution

It extends deep Boltzmann machines to a K-fan structure capable of multi-input and multi-output learning with shared representations and novel training objectives.

Findings

01

Effective multi-source information leveraging

02

Accurate multi-task predictions

03

Outperforms competitive baselines

Abstract

Multimodal learning with deep Boltzmann machines (DBMs) is an generative approach to fuse multimodal inputs, and can learn the shared representation via Contrastive Divergence (CD) for classification and information retrieval tasks. However, it is a 2-fan DBM model, and cannot effectively handle multiple prediction tasks. Moreover, this model cannot recover the hidden representations well by sampling from the conditional distribution when more than one modalities are missing. In this paper, we propose a K-fan deep structure model, which can handle the multi-input and muti-output learning problems effectively. In particular, the deep structure has K-branch for different inputs where each branch can be composed of a multi-layer deep model, and a shared representation is learned in an discriminative manner to tackle multimodal tasks. Given the deep structure, we propose two objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Domain Adaptation and Few-Shot Learning