Categorical Mixture Models on VGGNet activations
Sean Billings

TL;DR
This paper explores clustering Yelp restaurant photos using VGGNet activations, applying unsupervised learning techniques like LDA to identify meaningful photo topics aligned with human intuition and labels.
Contribution
It introduces a novel approach combining VGGNet features with LDA to effectively cluster images into interpretable archetypes.
Findings
VGGNet activations improve clustering quality
Object-based features yield meaningful photo archetypes
Clusters align well with Yelp labels
Abstract
In this project, I use unsupervised learning techniques in order to cluster a set of yelp restaurant photos under meaningful topics. In order to do this, I extract layer activations from a pre-trained implementation of the popular VGGNet convolutional neural network. First, I explore using LDA with the activations of convolutional layers as features. Secondly, I explore using the object-recognition powers of VGGNet trained on ImageNet in order to extract meaningful objects from the photos, and then perform LDA to group the photos under topic-archetypes. I find that this second approach finds meaningful archetypes, which match the human intuition for photo topics such as restaurant, food, and drinks. Furthermore, these clusters align well and distinctly with the actual yelp photo labels.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Neural Networks and Applications
MethodsLinear Discriminant Analysis
