Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations
A. Bochkov

TL;DR
This paper demonstrates that large language models can achieve high performance using frozen Unicode visual embeddings instead of trainable semantic embeddings, highlighting the emergent nature of semantics.
Contribution
It introduces a method of using frozen Unicode visual embeddings in Transformer models, challenging the traditional view of trainable semantic input embeddings.
Findings
Models with frozen Unicode embeddings outperform those with trainable embeddings on reasoning tasks.
High-level semantics emerge from the model architecture and data, not from trainable input embeddings.
The approach is compatible with any tokenizer, including a new Unicode-centric tokenizer.
Abstract
Understanding the locus of semantic representation in large language models (LLMs) is crucial for interpretability and architectural innovation. The dominant paradigm posits that trainable input embeddings serve as foundational "meaning vectors." This paper challenges that view. We construct Transformer models where the embedding layer is entirely frozen, with vectors derived not from data, but from the visual structure of Unicode glyphs. These non-semantic, precomputed visual embeddings are fixed throughout training. Our method is compatible with any tokenizer, including a novel Unicode-centric tokenizer we introduce to ensure universal text coverage. Despite the absence of trainable, semantically initialized embeddings, our models converge, generate coherent text, and, critically, outperform architecturally identical models with trainable embeddings on the MMLU reasoning benchmark. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Bochkov/bvv241-2-3model· 3 dl3 dl
- 🤗Bochkov/bvv241-maxmodel· 2 dl2 dl
- 🤗Bochkov/bvv241-nemomodel· 1 dl1 dl
- 🤗Bochkov/bvv241-absmodel· 4 dl4 dl
- 🤗Bochkov/abs-bvv-6model· 4 dl4 dl
- 🤗Bochkov/abs-bvv-5model· 5 dl5 dl
- 🤗Bochkov/abs-bvv-1model· 5 dl5 dl
- 🤗Bochkov/abs-bvv-2model· 4 dl4 dl
- 🤗Bochkov/abs-bvv-3model· 6 dl6 dl
- 🤗Bochkov/abs-bvv-4model· 3 dl3 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
MethodsDropout · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Dense Connections · Softmax · Transformer
