Mining Associated Text and Images with Dual-Wing Harmoniums

Eric P. Xing; Rong Yan; Alexander G. Hauptmann

arXiv:1207.1423·cs.LG·July 9, 2012

Mining Associated Text and Images with Dual-Wing Harmoniums

Eric P. Xing, Rong Yan, Alexander G. Hauptmann

PDF

TL;DR

This paper introduces a dual-wing harmonium model for mining multimedia data, enabling efficient inference and flexible topic modeling for tasks like classification, retrieval, and image annotation.

Contribution

The paper presents a novel dual-wing harmonium model that extends previous models, offering improved inference, robust topic mixing, and applicability to captioned images.

Findings

01

Effective in classification, retrieval, and image annotation

02

Outperforms existing models in empirical evaluations

03

Provides efficient inference and flexible topic modeling

Abstract

We propose a multi-wing harmonium model for mining multimedia data that extends and improves on earlier models based on two-layer random fields, which capture bidirectional dependencies between hidden topic aspects and observed inputs. This model can be viewed as an undirected counterpart of the two-layer directed models such as LDA for similar tasks, but bears significant difference in inference/learning cost tradeoffs, latent topic representations, and topic mixing mechanisms. In particular, our model facilitates efficient inference and robust topic mixing, and potentially provides high flexibilities in modeling the latent topic spaces. A contrastive divergence and a variational algorithm are derived for learning. We specialized our model to a dual-wing harmonium for captioned images, incorporating a multivariate Poisson for word-counts and a multivariate Gaussian for color histogram.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.