LLM Architecture, Scaling Laws, and Economics: A Quick Summary

William H. Press

arXiv:2511.11572·cs.GL·November 18, 2025

LLM Architecture, Scaling Laws, and Economics: A Quick Summary

William H. Press

PDF

Open Access

TL;DR

This paper provides a concise summary of LLM architecture, scaling laws, and economic considerations, focusing on Transformer models and their cost estimates for different scales, without introducing new research findings.

Contribution

It offers a clear, summarized overview of current LLM architectures, scaling laws, and cost estimates, filling a gap in accessible condensed information.

Findings

01

Transformer architecture details summarized

02

Scaling laws for compute and memory provided

03

Cost estimates for various LLM scales discussed

Abstract

The current standard architecture of Large Language Models (LLMs) with QKV self-attention is briefly summarized, including the architecture of a typical Transformer. Scaling laws for compute (flops) and memory (parameters plus data) are given, along with their present (2025) rough cost estimates for the parameters of present LLMs of various scales, including discussion of whether DeepSeek should be viewed as a special case. Nothing here is new, but this material seems not otherwise readily available in summary form.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Machine Learning in Materials Science · Text Readability and Simplification