Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Zeyuan Allen-Zhu

arXiv:2512.17351·cs.CL·December 22, 2025

Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Zeyuan Allen-Zhu

PDF

Open Access 10 Models

TL;DR

This paper introduces Canon Layers, lightweight components that improve information flow in language models, validated through synthetic tasks and real-world pretraining, enhancing reasoning and knowledge manipulation capabilities.

Contribution

The paper presents Canon Layers, a novel architectural component that enhances language model reasoning and information flow, validated across synthetic and real-world settings.

Findings

01

Canon layers double reasoning depth

02

They match or surpass state-of-the-art models in key tasks

03

Synthetic benchmarks effectively isolate core model capabilities

Abstract

Understanding architectural differences in language models is challenging, especially at academic-scale pretraining (e.g., 1.3B parameters, 100B tokens), where results are often dominated by noise and randomness. To overcome this, we introduce controlled synthetic pretraining tasks that isolate and evaluate core model capabilities. Within this framework, we discover CANON LAYERS: lightweight architectural components -- named after the musical term "canon" -- that promote horizontal information flow across neighboring tokens. Canon layers compute weighted sums of nearby token representations and integrate seamlessly into Transformers, linear attention, state-space models, or any sequence architecture. We present 12 key results. This includes how Canon layers enhance reasoning depth (e.g., by $2 \times$ ), reasoning breadth, knowledge manipulation, etc. They lift weak architectures like…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Topic Modeling · Explainable Artificial Intelligence (XAI)