Learning deep representation of multityped objects and tasks

Truyen Tran; Dinh Phung; Svetha Venkatesh

arXiv:1603.01359·stat.ML·March 7, 2016·1 cites

Learning deep representation of multityped objects and tasks

Truyen Tran, Dinh Phung, Svetha Venkatesh

PDF

Open Access

TL;DR

This paper presents a deep multitask architecture that effectively integrates multityped representations of multimodal objects and heterogeneously typed tasks, improving performance in social image retrieval and concept prediction.

Contribution

The proposed deep model uniquely combines multityped features and supports heterogeneously typed tasks, advancing multimodal and multityped object representation learning.

Findings

01

Produces more compact representations

02

Effectively integrates multiviews and multimodalities

03

Performs competitively against baseline methods

Abstract

We introduce a deep multitask architecture to integrate multityped representations of multimodal objects. This multitype exposition is less abstract than the multimodal characterization, but more machine-friendly, and thus is more precise to model. For example, an image can be described by multiple visual views, which can be in the forms of bag-of-words (counts) or color/texture histograms (real-valued). At the same time, the image may have several social tags, which are best described using a sparse binary vector. Our deep model takes as input multiple type-specific features, narrows the cross-modality semantic gaps, learns cross-type correlation, and produces a high-level homogeneous representation. At the same time, the model supports heterogeneously typed tasks. We demonstrate the capacity of the model on two applications: social image retrieval and multiple concept prediction. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications