Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Zeyuan Allen-Zhu

TL;DR
This paper introduces Canon Layers, lightweight components that improve information flow in language models, validated through synthetic tasks and real-world pretraining, enhancing reasoning and knowledge manipulation capabilities.
Contribution
The paper presents Canon Layers, a novel architectural component that enhances language model reasoning and information flow, validated across synthetic and real-world settings.
Findings
Canon layers double reasoning depth
They match or surpass state-of-the-art models in key tasks
Synthetic benchmarks effectively isolate core model capabilities
Abstract
Understanding architectural differences in language models is challenging, especially at academic-scale pretraining (e.g., 1.3B parameters, 100B tokens), where results are often dominated by noise and randomness. To overcome this, we introduce controlled synthetic pretraining tasks that isolate and evaluate core model capabilities. Within this framework, we discover CANON LAYERS: lightweight architectural components -- named after the musical term "canon" -- that promote horizontal information flow across neighboring tokens. Canon layers compute weighted sums of nearby token representations and integrate seamlessly into Transformers, linear attention, state-space models, or any sequence architecture. We present 12 key results. This includes how Canon layers enhance reasoning depth (e.g., by ), reasoning breadth, knowledge manipulation, etc. They lift weak architectures like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗facebook/PhysicsLM4.2__Llama-1B-Nemo-1T-lr0.002model· 12 dl· ♡ 112 dl♡ 1
- 🤗facebook/PhysicsLM4.2__Llama-1B-Nemo-1T-lr0.003model· ♡ 2♡ 2
- 🤗facebook/PhysicsLM4.2__Llama-1B-Nemo-2T-lr0.003model· ♡ 1♡ 1
- 🤗facebook/PhysicsLM4.2__Llama-1B-Nemo-2T-lr0.005model· ♡ 1♡ 1
- 🤗facebook/PhysicsLM4.2__Llama-3B-Nemo-1T-lr0.002model· ♡ 1♡ 1
- 🤗facebook/PhysicsLM4.2__Llama-3B-Nemo-1T-lr0.003model· ♡ 1♡ 1
- 🤗facebook/PhysicsLM4.2__Llama-8B-Nemo-1T-lr0.002model· ♡ 1♡ 1
- 🤗facebook/PhysicsLM4.2__Llama-8B-Nemo-1T-lr0.003model· ♡ 2♡ 2
- 🤗facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-1T-lr0.002model· ♡ 2♡ 2
- 🤗facebook/PhysicsLM4.2__LlamaCanon-1B-Nemo-1T-lr0.003model· ♡ 1♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Explainable Artificial Intelligence (XAI)
