Hash2Vec, Feature Hashing for Word Embeddings
Luis Argerich, Joaqu\'in Torr\'e Zaffaroni, Mat\'ias J Cano

TL;DR
This paper introduces Hash2Vec, a novel, training-free word embedding method using feature hashing, demonstrating comparable semantic capture to GloVe with high scalability for NLP tasks.
Contribution
First application of feature hashing to generate word embeddings, providing a scalable, linear-time alternative to traditional training-based methods.
Findings
Achieves similar semantic quality to GloVe
Operates in linear time without training
Scalable for large NLP datasets
Abstract
In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words. We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques
MethodsGloVe Embeddings
