HubRouter: A Pluggable Sub-Quadratic Routing Primitive for Hybrid Sequence Models

Abhinaba Basu

arXiv:2604.22442·cs.LG·April 27, 2026

HubRouter: A Pluggable Sub-Quadratic Routing Primitive for Hybrid Sequence Models

Abhinaba Basu

PDF

TL;DR

HubRouter introduces a pluggable, efficient routing module that replaces quadratic attention with a linear-like approach, improving training throughput and maintaining competitive performance in sequence models.

Contribution

It presents a novel hub-mediated routing mechanism that reduces attention complexity from quadratic to sub-quadratic, with demonstrated improvements in training speed and model perplexity.

Findings

01

HubRouter achieves up to 90x training throughput at sequence length 1024.

02

Replacing 25% of attention layers with HubRouter improves perplexity.

03

Optimal hub count (8-14) is identified for stable convergence.

Abstract

We introduce HubRouter, a pluggable module that replaces O(n^2) attention layers with O(nM) hub-mediated routing, where M << n is a small number of learned hub tokens. We demonstrate it in two from-scratch architectures: a Jamba-style hybrid and a 12-layer Transformer; retrofit into pretrained models is a tested negative case. HubRouter implements an encode-decode-score-council pipeline: M learned hubs cross-attend to all tokens, tokens project against hubs for routing fingerprints, a score head selects top-k tokens, and a sparse council attends only to the selected subset. We validate HubRouter in three settings. (1) Hub-Jamba yields a nominal 4.2% PPL improvement (200.2 vs 209.0, single seed; possibly within seed noise) and up to ~90x training throughput at sequence length 1024 in matched PyTorch-native baselines; an optimised baseline would narrow this to ~10-15x. (2) Graduated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.