Learning Social Image Embedding with Deep Multimodal Attention Networks

Feiran Huang; Xiaoming Zhang; Zhoujun Li; Tao Mei; Yueying He,; Zhonghua Zhao

arXiv:1710.06582·cs.MM·October 19, 2017

Learning Social Image Embedding with Deep Multimodal Attention Networks

Feiran Huang, Xiaoming Zhang, Zhoujun Li, Tao Mei, Yueying He,, Zhonghua Zhao

PDF

TL;DR

This paper introduces DMAN, a deep multimodal attention network that jointly embeds social images by capturing both multimodal content relations and social network links, improving classification and search tasks.

Contribution

The paper proposes a novel deep model combining multimodal attention and Siamese-Triplet networks for social image embedding, integrating content and link information.

Findings

01

DMAN outperforms state-of-the-art embeddings in classification.

02

DMAN significantly improves cross-modal search results.

03

The approach effectively captures fine-grained content relations.

Abstract

Learning social media data embedding by deep models has attracted extensive research interest as well as boomed a lot of applications, such as link prediction, classification, and cross-modal search. However, for social images which contain both link information and multimodal contents (e.g., text description, and visual content), simply employing the embedding learnt from network structure or data content results in sub-optimal social image representation. In this paper, we propose a novel social image embedding approach called Deep Multimodal Attention Networks (DMAN), which employs a deep model to jointly embed multimodal contents and link information. Specifically, to effectively capture the correlations between multimodal contents, we propose a multimodal attention network to encode the fine-granularity relation between image regions and textual words. To leverage the network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.