The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency
Robin Geens, Jonas De Schouwer, Marian Verhelst, Thierry Tambe

TL;DR
The paper discusses how model architectures optimized for hyperscale cloud platforms, like advanced State-Space Models, compromise edge efficiency, increasing latency and reducing suitability for real-time, edge-based applications.
Contribution
It highlights the divergence of State-Space Model evolution from edge-efficient designs and emphasizes the need to decouple cloud optimization from core architecture for edge viability.
Findings
Mamba-3 architecture increases latency by 28% at 880M parameters.
Edge penalty worsens to 48% latency increase for 15M-parameter models.
Optimizations for hyperscale GPUs hinder edge-native efficiency.
Abstract
The Hardware Lottery posits that research directions are dictated by available silicon compute platforms. We identify a derivative phenomenon, the Hyperscale Lottery, where model architectures are optimized for cloud throughput at the expense of algorithmic efficiency. While State-Space Models (SSMs) such as Mamba were lauded for their linear complexity, ideal for edge intelligence, their evolution from Mamba-1 to Mamba-3 reveals a systematic divergence from edge-native efficiency. We demonstrate that Mamba-3's architectural changes, designed to saturate hyperscale GPUs, impose a significant edge penalty: a 28% latency increase at 880M parameters, worsening to 48% for 15M-parameter models. We argue for decoupling cloud-scale saturation strategies from core architectural design to preserve the viability of single-user, real-time edge intelligence.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
