TL;DR
Magnitude is a fast, lightweight Python package that efficiently handles large-scale vector embeddings in NLP, offering significant speed improvements and robustness features.
Contribution
It introduces Magnitude, a novel open-source utility for efficient storage, manipulation, and robustness in vector embeddings, outperforming existing tools like Gensim.
Findings
Magnitudes performs operations up to 6,000 times faster than Gensim.
It features a compact storage format for large embedding collections.
Includes out-of-vocabulary lookup for improved robustness.
Abstract
Vector space embedding models like word2vec, GloVe, fastText, and ELMo are extremely popular representations in natural language processing (NLP) applications. We present Magnitude, a fast, lightweight tool for utilizing and processing embeddings. Magnitude is an open source Python package with a compact vector storage file format that allows for efficient manipulation of huge numbers of embeddings. Magnitude performs common operations up to 60 to 6,000 times faster than Gensim. Magnitude introduces several novel features for improved robustness like out-of-vocabulary lookups.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · fastText · GloVe Embeddings · Long Short-Term Memory · Bidirectional LSTM · Softmax · ELMo
