From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings
Andrei M. Butnaru, Radu Tudor Ionescu

TL;DR
This paper introduces a novel text classification method that uses clustering of word embeddings to create class-specific vocabularies, improving performance over traditional bag of words models.
Contribution
The paper presents a new clustering-based approach for text classification that leverages class-specific vocabularies derived from word embeddings.
Findings
Outperforms standard bag of words in text categorization.
Effective in polarity classification tasks.
Utilizes class-specific vocabularies for better semantic representation.
Abstract
In this paper, we propose a novel approach for text classification based on clustering word embeddings, inspired by the bag of visual words model, which is widely used in computer vision. After each word in a collection of documents is represented as word vector using a pre-trained word embeddings model, a k-means algorithm is applied on the word vectors in order to obtain a fixed-size set of clusters. The centroid of each cluster is interpreted as a super word embedding that embodies all the semantically related word vectors in a certain region of the embedding space. Every embedded word in the collection of documents is then assigned to the nearest cluster centroid. In the end, each document is represented as a bag of super word embeddings by computing the frequency of each super word embedding in the respective document. We also diverge from the idea of building a single vocabulary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
