Scaling Embedding Layers in Language Models
Da Yu, Edith Cohen, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Daogao Liu, Chiyuan Zhang

TL;DR
The paper introduces SCONE, a scalable n-gram embedding method that improves language model performance by adding contextualized embeddings without increasing inference costs, enabling effective scaling of model size and embeddings.
Contribution
SCONE provides a novel approach to extend embedding layers with n-gram embeddings, allowing scalable model improvements while maintaining low inference latency.
Findings
Scaling n-gram embeddings improves model performance.
Model with 1B parameters outperforms 1.9B baseline.
Inference cost remains low despite scaling.
Abstract
We propose (calable, ontextualized, ffloaded, -gram mbedding), a new method for extending input embedding layers to enhance language model performance. To avoid increased decoding costs, retains the original vocabulary while introducing embeddings for a set of frequent n-grams. These embeddings provide contextualized representation for each input token and are learned with a separate model during training. After training, embeddings are precomputed and stored in off-accelerator memory; during inference, querying them has minimal impact on latency due to the low complexity of embedding lookups. enables two new scaling strategies: increasing the number of n-gram embeddings and scaling the model used to learn them, both while maintaining fixed accelerator usage during inference (in terms of FLOPS and memory). We show that scaling both aspects enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training
