A Markov Categorical Framework for Language Modeling
Yifan Zhang

TL;DR
This paper introduces a Markov categorical framework to analyze language models, connecting training objectives, internal representations, and capabilities through an information-theoretic and spectral perspective.
Contribution
It provides a unified analytical framework using Markov categories to explain language model mechanisms, training effects, and internal geometry.
Findings
Quantifies information surplus in hidden states for speculative decoding.
Shows NLL learns data's intrinsic conditional uncertainty via categorical entropy.
Provides spectral analysis linking representation geometry to predictive prototypes.
Abstract
Autoregressive language models achieve remarkable performance, yet a unified theory explaining their internal mechanisms, how training shapes representations, and why these representations support complex behavior remains incomplete. We introduce an analytical framework that models the single-step generation process as a composition of information-processing stages using the language of Markov categories. This compositional perspective connects three aspects of language modeling that are often studied separately: the training objective, the geometry of the learned representation space, and practical model capabilities. First, our framework gives an information-theoretic rationale for parallel drafting methods such as speculative decoding by quantifying the information surplus a hidden state contains about future tokens beyond the immediate next one. Second, we clarify how the standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
