Is there "Secret Sauce'' in Large Language Model Development?
Matthias Mertens, Natalia Fischl-Lanzoni, Neil Thompson

TL;DR
This study analyzes whether large language model performance gains are mainly due to scaling compute or proprietary techniques, finding that scaling dominates at the frontier but proprietary methods matter more elsewhere.
Contribution
It provides empirical evidence that scaling compute largely explains performance improvements at the frontier, while proprietary techniques reduce compute needs for fixed capabilities.
Findings
80-90% of performance differences at the frontier are due to higher training compute
Proprietary techniques significantly reduce compute requirements away from the frontier
Within companies, model training efficiency varies by over 40x
Abstract
Do leading LLM developers possess a proprietary ``secret sauce'', or is LLM performance driven by scaling up compute? Using training and benchmark data for 809 models released between 2022 and 2025, we estimate scaling-law regressions with release-date and developer fixed effects. We find clear evidence of developer-specific efficiency advantages, but their importance depends on where models lie in the performance distribution. At the frontier, 80-90% of performance differences are explained by higher training compute, implying that scale--not proprietary technology--drives frontier advances. Away from the frontier, however, proprietary techniques and shared algorithmic progress substantially reduce the compute required to reach fixed capability thresholds. Some companies can systematically produce smaller models more efficiently. Strikingly, we also find substantial variation of model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
