A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data
Yin Zheng, Yu-Jin Zhang, Hugo Larochelle

TL;DR
This paper introduces SupDocNADE, a supervised deep autoregressive model for multimodal data, which improves joint representation learning for image and text data, achieving state-of-the-art results in multimedia retrieval.
Contribution
It extends DocNADE to multimodal data with a supervised and deep version, enhancing discriminative power and performance in image annotation tasks.
Findings
SupDocNADE outperforms existing topic models on LabelMe and UIUC-Sports datasets.
Deep extension of the model surpasses shallow versions in accuracy.
Achieves state-of-the-art performance on the Flickr multimedia retrieval dataset.
Abstract
Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to deal with multimodal data, such as in image annotation tasks. Another popular approach to model the multimodal data is through deep neural networks, such as the deep Boltzmann machine (DBM). Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for text document modeling. In this work, we show how to successfully apply and extend this model to multimodal data, such as simultaneous image classification and annotation. First, we propose SupDocNADE, a supervised extension of DocNADE, that increases the discriminative power of the learned hidden topic features and show how to employ it to learn a joint representation from image visual words, annotation words and class label information.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDeep Boltzmann Machine
