A Supervised Neural Autoregressive Topic Model for Simultaneous Image Classification and Annotation
Yin Zheng, Yu-Jin Zhang, Hugo Larochelle

TL;DR
This paper introduces SupDocNADE, a supervised neural autoregressive model that enhances scene recognition and annotation by integrating label information and spatial data, outperforming traditional topic models.
Contribution
The paper presents SupDocNADE, a novel supervised extension of DocNADE, tailored for visual scene modeling and capable of simultaneous classification and annotation.
Findings
SupDocNADE outperforms supervised LDA on multiple datasets.
Incorporating spatial and annotation info improves model accuracy.
Model achieves state-of-the-art results in scene recognition and annotation.
Abstract
Topic modeling based on latent Dirichlet allocation (LDA) has been a framework of choice to perform scene recognition and annotation. Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for document modeling. In this work, we show how to successfully apply and extend this model to the context of visual scene modeling. Specifically, we propose SupDocNADE, a supervised extension of DocNADE, that increases the discriminative power of the hidden topic features by incorporating label information into the training objective of the model. We also describe how to leverage information about the spatial position of the visual words and how to embed additional image annotations, so as to simultaneously perform image classification and annotation. We test our model on the Scene15,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsLinear Discriminant Analysis
