A Latent Variable Framework for Scaling Laws in Large Language Models
Peiyao Cai, Chengyu Cui, Felipe Maia Polo, Seamus Somerstep, Leshem Choshen, Mikhail Yurochkin, Moulinath Banerjee, Yuekai Sun, Kean Ming Tan, Gongjun Xu

TL;DR
This paper introduces a latent variable modeling framework to better understand and predict the performance scaling laws of diverse large language models across multiple benchmarks, addressing heterogeneity issues.
Contribution
It presents a novel latent variable approach that captures common features within LLM families and models their benchmark performance, with an estimation procedure and empirical validation.
Findings
Effective modeling of performance across diverse LLMs
Supports estimation and downstream tasks
Validated on 12 benchmarks from Open LLM Leaderboard
Abstract
We propose a statistical framework built on latent variable modeling for scaling laws of large language models (LLMs). Our work is motivated by the rapid emergence of numerous new LLM families with distinct architectures and training strategies, evaluated on an increasing number of benchmarks. This heterogeneity makes a single global scaling curve inadequate for capturing how performance varies across families and benchmarks. To address this, we propose a latent variable modeling framework in which each LLM family is associated with a latent variable that captures the common underlying features in that family. An LLM's performance on different benchmarks is then driven by its latent skills, which are jointly determined by the latent variable and the model's own observable features. We develop an estimation procedure for this latent variable model and establish its statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
