Embedding Compression with Isotropic Iterative Quantization
Siyu Liao, Jie Chen, Yanzhi Wang, Qinru Qiu, Bo Yuan

TL;DR
This paper introduces Isotropic Iterative Quantization (IIQ), a method for compressing word embeddings into binary vectors, achieving over thirty-fold compression with maintained or improved NLP model performance.
Contribution
The paper adapts iterative quantization for embedding compression, ensuring isotropic properties and significantly reducing memory requirements in NLP models.
Findings
Over thirty-fold compression ratio achieved
Comparable or improved performance over original embeddings
Effective on pre-trained GloVe and HDC embeddings
Abstract
Continuous representation of words is a standard component in deep learning-based NLP models. However, representing a large vocabulary requires significant memory, which can cause problems, particularly on resource-constrained platforms. Therefore, in this paper we propose an isotropic iterative quantization (IIQ) approach for compressing embedding vectors into binary ones, leveraging the iterative quantization technique well established for image retrieval, while satisfying the desired isotropic property of PMI based models. Experiments with pre-trained embeddings (i.e., GloVe and HDC) demonstrate a more than thirty-fold compression ratio with comparable and sometimes even improved performance over the original real-valued embedding vectors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Image and Video Retrieval Techniques
MethodsGloVe Embeddings
