Directional Routing in Transformers

Kevin Taylor

arXiv:2603.14923·cs.LG·March 17, 2026

Directional Routing in Transformers

Kevin Taylor

PDF

Open Access 1 Models

TL;DR

This paper presents directional routing, a lightweight mechanism in transformers that significantly enhances factual recall and induction accuracy by controlling attention head suppression directions, with minimal parameter overhead.

Contribution

Introduces directional routing, a novel, efficient attention control mechanism that becomes the primary computational pathway and improves model interpretability and performance.

Findings

01

Routing is the dominant computational pathway in the model.

02

Disabling routing collapses factual recall and induction accuracy.

03

Routing reduces perplexity by 31-56% relative to baseline.

Abstract

We introduce directional routing, a lightweight mechanism that gives each transformer attention head learned suppression directions controlled by a shared router, at 3.9% parameter cost. We train a 433M-parameter model alongside an identical baseline in a single run, then trace the resulting circuits through mechanistic interpretability. Routing becomes the model's dominant computational pathway. Disabling it collapses factual recall to near-zero probability across all 8 test prompts and drops induction accuracy from 93.4% to 0.0%. Knocking out individual attention heads has negligible effect: the primary mover head's removal actually increases target probability, and induction heads retain 98.6% accuracy without their strongest member. The coordination mechanism is irreplaceable; the components it coordinates are not. The model also self-organizes, without explicit pressure, into two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
KitsuVp/NeoLLM
model· 2.9k dl· ♡ 1
2.9k dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices