Loading paper
An Information Theory of Compute-Optimal Size Scaling, Emergence, and Plateaus in Language Models | Tomesphere