ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models
Orhan Demirci, Sezer Aptourachman, Ayd{\i}n Kaya

TL;DR
This paper introduces ADE, a scalable framework for multi-anchor word embeddings integrated into large language models, improving semantic expressiveness with fewer parameters.
Contribution
ADE combines efficient vocabulary projection, grouped positional encoding, and context-aware reweighting to enable large-scale multi-anchor embeddings in transformers.
Findings
ADE surpasses DeBERTa on DBpedia-14 classification.
Achieves 98.7% fewer trainable parameters than DeBERTa-v3-base.
Compresses embedding layer over 40x while maintaining competitive performance.
Abstract
Word embeddings are fundamental to natural language processing, yet traditional approaches represent each word with a single vector, creating representational bottlenecks for polysemous words and limiting semantic expressiveness. While multi-anchor representations have shown promise by representing words as combinations of multiple vectors, they have been limited to small-scale models due to computational inefficiency and lack of integration with modern transformer architectures. We introduce Adaptive Dictionary Embeddings (ADE), a framework that successfully scales multi-anchor word representations to large language models. ADE makes three key contributions: (1) Vocabulary Projection (VP), which transforms the costly two-stage anchor lookup into a single efficient matrix operation; (2) Grouped Positional Encoding (GPE), a novel positional encoding scheme where anchors of the same word…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
