The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency

Robin Geens; Jonas De Schouwer; Marian Verhelst; Thierry Tambe

arXiv:2604.07935·cs.AR·April 10, 2026

The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency

Robin Geens, Jonas De Schouwer, Marian Verhelst, Thierry Tambe

PDF

TL;DR

The paper discusses how model architectures optimized for hyperscale cloud platforms, like advanced State-Space Models, compromise edge efficiency, increasing latency and reducing suitability for real-time, edge-based applications.

Contribution

It highlights the divergence of State-Space Model evolution from edge-efficient designs and emphasizes the need to decouple cloud optimization from core architecture for edge viability.

Findings

01

Mamba-3 architecture increases latency by 28% at 880M parameters.

02

Edge penalty worsens to 48% latency increase for 15M-parameter models.

03

Optimizations for hyperscale GPUs hinder edge-native efficiency.

Abstract

The Hardware Lottery posits that research directions are dictated by available silicon compute platforms. We identify a derivative phenomenon, the Hyperscale Lottery, where model architectures are optimized for cloud throughput at the expense of algorithmic efficiency. While State-Space Models (SSMs) such as Mamba were lauded for their linear complexity, ideal for edge intelligence, their evolution from Mamba-1 to Mamba-3 reveals a systematic divergence from edge-native efficiency. We demonstrate that Mamba-3's architectural changes, designed to saturate hyperscale GPUs, impose a significant edge penalty: a 28% latency increase at 880M parameters, worsening to 48% for 15M-parameter models. We argue for decoupling cloud-scale saturation strategies from core architectural design to preserve the viability of single-user, real-time edge intelligence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.