Learning K-way D-dimensional Discrete Codes for Compact Embedding Representations
Ting Chen, Martin Renqiang Min, Yizhou Sun

TL;DR
This paper introduces a compact K-way D-dimensional discrete encoding scheme for embeddings, significantly reducing parameter size while maintaining or improving performance across NLP and graph applications.
Contribution
It proposes a novel KD encoding method with a relaxed discrete optimization approach for end-to-end learning of meaningful codes, replacing traditional one-hot embeddings.
Findings
Embedding size reduced by up to 98%
Achieves comparable or better performance
Applicable across NLP and graph models
Abstract
Conventional embedding methods directly associate each symbol with a continuous embedding vector, which is equivalent to applying a linear transformation based on a "one-hot" encoding of the discrete symbols. Despite its simplicity, such approach yields the number of parameters that grows linearly with the vocabulary size and can lead to overfitting. In this work, we propose a much more compact K-way D-dimensional discrete encoding scheme to replace the "one-hot" encoding. In the proposed "KD encoding", each symbol is represented by a -dimensional code with a cardinality of , and the final symbol embedding vector is generated by composing the code embedding vectors. To end-to-end learn semantically meaningful codes, we derive a relaxed discrete optimization approach based on stochastic gradient descent, which can be generally applied to any differentiable computational graph with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Complex Network Analysis Techniques · Machine Learning and ELM
