Mining Associated Text and Images with Dual-Wing Harmoniums
Eric P. Xing, Rong Yan, Alexander G. Hauptmann

TL;DR
This paper introduces a dual-wing harmonium model for mining multimedia data, enabling efficient inference and flexible topic modeling for tasks like classification, retrieval, and image annotation.
Contribution
The paper presents a novel dual-wing harmonium model that extends previous models, offering improved inference, robust topic mixing, and applicability to captioned images.
Findings
Effective in classification, retrieval, and image annotation
Outperforms existing models in empirical evaluations
Provides efficient inference and flexible topic modeling
Abstract
We propose a multi-wing harmonium model for mining multimedia data that extends and improves on earlier models based on two-layer random fields, which capture bidirectional dependencies between hidden topic aspects and observed inputs. This model can be viewed as an undirected counterpart of the two-layer directed models such as LDA for similar tasks, but bears significant difference in inference/learning cost tradeoffs, latent topic representations, and topic mixing mechanisms. In particular, our model facilitates efficient inference and robust topic mixing, and potentially provides high flexibilities in modeling the latent topic spaces. A contrastive divergence and a variational algorithm are derived for learning. We specialized our model to a dual-wing harmonium for captioned images, incorporating a multivariate Poisson for word-counts and a multivariate Gaussian for color histogram.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
