Transformer Scalability Crisis: The First Comprehensive Empirical Analysis of Performance Walls in Modern Language Models

Mahdi Naser Moghadasi; Faezeh Ghaderi

arXiv:2605.15413·cs.LG·May 18, 2026

Transformer Scalability Crisis: The First Comprehensive Empirical Analysis of Performance Walls in Modern Language Models

Mahdi Naser Moghadasi, Faezeh Ghaderi

PDF

TL;DR

This paper systematically evaluates 118 transformer models, revealing fundamental performance limitations at longer sequence lengths and challenging assumptions about their scalability in real-world applications.

Contribution

It provides the first comprehensive empirical analysis of transformer performance walls, uncovering critical scalability issues and establishing new benchmarking methodologies.

Findings

01

88.1% of models process up to 512 tokens

02

Only 44.9% process 1024 tokens successfully

03

Compressed models outperform large models in efficiency

Abstract

Despite the remarkable success of transformer architectures in natural language processing, their scalability limitations remain poorly understood through systematic empirical analysis. This paper presents the first comprehensive large-scale evaluation of 118 transformer models across seven distinct architectural categories, revealing fundamental performance walls that manifest as hard deployment constraints. Our systematic benchmarking methodology uncovers a critical scalability crisis: while 88.1% of models successfully process sequences up to 512 tokens, this drops dramatically to 44.9% at 1024 tokens, with complete failure (0%) at 2048 tokens. Through rigorous analysis of loading times, memory consumption, and computational efficiency across sequence lengths from 128 to 2048 tokens, we demonstrate that compressed models achieve superior parameter efficiency (649.2 tokens/sec/M…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.