Hash2Vec, Feature Hashing for Word Embeddings

Luis Argerich; Joaqu\'in Torr\'e Zaffaroni; Mat\'ias J Cano

arXiv:1608.08940·cs.CL·April 18, 2017·2 cites

Hash2Vec, Feature Hashing for Word Embeddings

Luis Argerich, Joaqu\'in Torr\'e Zaffaroni, Mat\'ias J Cano

PDF

Open Access

TL;DR

This paper introduces Hash2Vec, a novel, training-free word embedding method using feature hashing, demonstrating comparable semantic capture to GloVe with high scalability for NLP tasks.

Contribution

First application of feature hashing to generate word embeddings, providing a scalable, linear-time alternative to traditional training-based methods.

Findings

01

Achieves similar semantic quality to GloVe

02

Operates in linear time without training

03

Scalable for large NLP datasets

Abstract

In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words. We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques

MethodsGloVe Embeddings