Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs
Bo Bai

TL;DR
This paper introduces a Semantic Information Theory for LLMs, replacing classical bits with tokens as carriers of meaning, and recasts attention and transformers within an energy-based, causal framework.
Contribution
It develops a first-principles semantic theory for LLMs, modeling them as energy-based, causal channels with new information measures centered on tokens.
Findings
Recasts attention and transformers as energy-based models.
Defines directed rate-distortion and rate-reward functions for training and reinforcement learning.
Provides a causal interpretation of next-token prediction and limits of LLM reasoning.
Abstract
Despite the empirical successes of Large Language Models (LLMs), the prevailing paradigm is heuristic and experiment-driven, tethered to massive compute and data, while a first-principles theory remains absent. This treatise develops a Semantic Information Theory at the confluence of statistical physics, signal processing, and classical information theory, organized around a single paradigm shift: replacing the classical BIT - a microscopic substrate devoid of semantic content - with the macroscopic TOKEN as the atomic carrier of meaning and reasoning. Within this framework we recast attention and the Transformer as energy-based models, and interpret semantic embedding as vectorization on the semantic manifold. Modeling the LLM as a stateful channel with feedback, we adopt Massey's directed information as the native causal measure of autoregressive generation, from which we derive a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
